| home / programming / Lucene / 1 | [previous] [next] |
|
|
In this section you’ll see a single class called Indexer and its four static methods;
together, they recursively traverse file system directories and index all files with a
.txt extension. When Indexer completes execution it leaves behind a Lucene
index for its sibling, Searcher (presented in section 1.4.2).
We don’t expect you to be familiar with the few Lucene classes and methods
used in this example—we’ll explain them shortly. After the annotated code listing,
we show you how to use Indexer; if it helps you to learn how Indexer is used before
you see how it’s coded, go directly to the usage discussion that follows the code.
Listing 1.1 shows the Indexer command-line program. It takes two arguments:
| Listing 1.1 Indexer: traverses a file system and indexes .txt files |

Interestingly, the bulk of the code performs recursive directory traversal (2).
Only the creation and closing of the IndexWriter (1) and four lines in the
indexFile method (3 4 5) of Indexer involve the Lucene API—effectively six
lines of code.
This example intentionally focuses on text files with .txt extensions to keep things simple while demonstrating Lucene’s usage and power. In chapter 7, we’ll show you how to handle nontext files, and we’ll develop a small ready-to-use framework capable of parsing and indexing documents in several common formats.
From the command line, we ran Indexer against a local working directory
including Lucene’s own source code. We instructed Indexer to index files under
the /lucene directory and store the Lucene index in the build/index directory:
% java lia.meetlucene.Indexer build/index/lucene
Indexing /lucene/build/test/TestDoc/test.txt
Indexing /lucene/build/test/TestDoc/test2.txt
Indexing /lucene/BUILD.txt
Indexing /lucene/CHANGES.txt
Indexing /lucene/LICENSE.txt
Indexing /lucene/README.txt
Indexing /lucene/src/jsp/README.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. stemsUnicode.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/test1251.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/testKOI8.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. testUnicode.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. wordsUnicode.txt
Indexing /lucene/todo.txt
Indexing 13 files took 2205 milliseconds/code>
Indexer prints out the names of files it indexes, so you can see that it indexes
only files with the .txt extension.
build/index c:\lucene.When it completes indexing, Indexer prints out the number of files it indexed and
the time it took to do so. Because the reported time includes both file-directory
traversal and indexing, you shouldn’t consider it an official performance measure.
In our example, each of the indexed files was small, but roughly two seconds to index a handful of text files is reasonably impressive.
Indexing speed is a concern, and we cover it in chapter 2. But generally, searching is of even greater importance.
| home / programming / Lucene / 1 | [previous] [next] |
Created: March 27, 2003
Revised: January 24, 2005
URL: http://webreference.com/programming/lucene/1