The dRuby Book

10.1 Running the App

We’ll use MyDrip again, so please make sure to run MyDrip.invoke or start a Drip server manually in a Windows environment.

  ​$ irb -r drip -r my_drip​
  ​>> MyDrip.invoke​
  ​=> 45616​

The sample code is included as part of the Drip source code. Let’s download it first.

  ​$ cd ~​
  ​$ git clone git://github.com/seki/Drip.git​
  ​$ cd Drip/sample/demo4book​

Before running the crawler, please edit line 10 of crawl.rb and add the directory name you want to search. I suggest you choose a relatively small directory, because it will take a long time to experiment with a directory that has many files. About 500 files should be sufficient. In our experiment, I specified the root directory of the source code.

  ​@root = File.expand_path('~/Drip/')​

Now let’s run crawl.rb. It will show the list of files.

  ​$ ruby crawl.rb​
  ​["install.rb",​
  ​ "lib/drip/version.rb",​
  ​ "lib/drip.rb",​
  ​ "lib/my_drip.rb",​
  ​ "sample/copocopo.rb",​
  ​ "sample/demo4book/crawl.rb",​
  ​ "sample/demo4book/index.rb",​
  ​ "sample/drip_s.rb",​
  ​ "sample/drip_tw.rb",​
  ​ "sample/gca.rb",​
  ​ "sample/hello_tw.rb",​
  ​ "sample/my_status.rb",​
  ​ "sample/simple-oauth.rb",​
  ​ "sample/tw_markov.rb",​
  ​ "test/basic.rb"]​

Next, start the indexer in a different terminal. Once started, type a word you want to search, and it will list the filenames that include the word. In the following example, we searched for the word def. It may not be fully indexed if you search right after starting up the indexer. If you repeat the search, you may be able to see the list of filenames grow.

  ​$ ruby index.rb​
  ​def​
  ​["sample/demo4book/index.rb", "sample/demo4book/crawl.rb"]​
  ​2​
  ​def​
  ​["sample/drip_s.rb",​
  ​ "lib/drip.rb",​
  ​ "lib/my_drip.rb",​
  ​ "sample/copocopo.rb",​
  ​ "sample/demo4book/index.rb",​
  ​ "sample/demo4book/crawl.rb"]​
  ​6​

The default crawling interval is set to sixty seconds. When you type something from standard input, the program will end once the crawling finishes. I designed this so that our crawler can rest from time to time to mimic how a crawler in a normal search system works. When the search range is very wide (for example, searching web pages), then it is impossible to keep crawling the update. You can update the indexing quickly if you shorten the crawling interval. If you modify this crawler, you may even be able to create your own real-time search tool. Some operating systems can notify about file changes, so integrating this as a trigger for crawling will be interesting, too.

In the following sections, we’ll look in-depth at the source code.