Lucene in Action: Meet Lucene Pt. 1 | 3

Lucene in Action: Meet Lucene Pt. 1

1.4.1 Creating an index

In this section you'll see a single class called Indexer and its four static methods; together, they recursively traverse file system directories and index all files with a .txt extension. When Indexer completes execution it leaves behind a Lucene index for its sibling, Searcher (presented in section 1.4.2).

We don't expect you to be familiar with the few Lucene classes and methods used in this example—we'll explain them shortly. After the annotated code listing, we show you how to use Indexer; if it helps you to learn how Indexer is used before you see how it's coded, go directly to the usage discussion that follows the code.

Using Indexer to index text files

Listing 1.1 shows the Indexer command-line program. It takes two arguments:

Listing 1.1 Indexer: traverses a file system and indexes .txt files

Interestingly, the bulk of the code performs recursive directory traversal (2). Only the creation and closing of the IndexWriter (1) and four lines in the indexFile method (3 4 5) of Indexer involve the Lucene API—effectively six lines of code.

This example intentionally focuses on text files with .txt extensions to keep things simple while demonstrating Lucene's usage and power. In chapter 7, we'll show you how to handle nontext files, and we'll develop a small ready-to-use framework capable of parsing and indexing documents in several common formats.

Running Indexer

From the command line, we ran Indexer against a local working directory including Lucene's own source code. We instructed Indexer to index files under the /lucene directory and store the Lucene index in the build/index directory:

Indexer prints out the names of files it indexes, so you can see that it indexes only files with the .txt extension.

When it completes indexing, Indexer prints out the number of files it indexed and the time it took to do so. Because the reported time includes both file-directory traversal and indexing, you shouldn't consider it an official performance measure.

In our example, each of the indexed files was small, but roughly two seconds to index a handful of text files is reasonably impressive.

Indexing speed is a concern, and we cover it in chapter 2. But generally, searching is of even greater importance.

Created: March 27, 2003
Revised: January 24, 2005

URL: http://webreference.com/programming/lucene/1