Lucene in Action: Meet Lucene Pt. 1 | 3
Lucene in Action: Meet Lucene Pt. 1
1.4.1 Creating an index
In this section you'll see a single class called Indexer and its four static methods;
together, they recursively traverse file system directories and index all files with a
.txt extension. When Indexer completes execution it leaves behind a Lucene
index for its sibling, Searcher (presented in section 1.4.2).
We don't expect you to be familiar with the few Lucene classes and methods
used in this examplewe'll explain them shortly. After the annotated code listing,
we show you how to use Indexer; if it helps you to learn how Indexer is used before
you see how it's coded, go directly to the usage discussion that follows the code.
Using Indexer to index text files
Listing 1.1 shows the Indexer command-line program. It takes two arguments:
- A path to a directory where we store the Lucene index
- A path to a directory that contains the files we want to index
| Listing 1.1 Indexer: traverses a file system and indexes .txt files |

Interestingly, the bulk of the code performs recursive directory traversal (2).
Only the creation and closing of the IndexWriter (1) and four lines in the
indexFile method (3 4 5) of Indexer involve the Lucene APIeffectively six
lines of code.
This example intentionally focuses on text files with .txt extensions to keep things simple while demonstrating Lucene's usage and power. In chapter 7, we'll show you how to handle nontext files, and we'll develop a small ready-to-use framework capable of parsing and indexing documents in several common formats.
Running Indexer
From the command line, we ran Indexer against a local working directory
including Lucene's own source code. We instructed Indexer to index files under
the /lucene directory and store the Lucene index in the build/index directory:
% java lia.meetlucene.Indexer build/index/lucene
Indexing /lucene/build/test/TestDoc/test.txt
Indexing /lucene/build/test/TestDoc/test2.txt
Indexing /lucene/BUILD.txt
Indexing /lucene/CHANGES.txt
Indexing /lucene/LICENSE.txt
Indexing /lucene/README.txt
Indexing /lucene/src/jsp/README.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. stemsUnicode.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/test1251.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/testKOI8.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. testUnicode.txt
Indexing /lucene/src/test/org/apache/lucene/analysis/ru/
. wordsUnicode.txt
Indexing /lucene/todo.txt
Indexing 13 files took 2205 milliseconds/code>
Indexer prints out the names of files it indexes, so you can see that it indexes
only files with the .txt extension.
- NOTE If you're running this application on a Windows platform command shell,
you need to adjust the command line's directory and path separators.
The Windows command line is java
build/index c:\lucene.When it completes indexing, Indexer prints out the number of files it indexed and
the time it took to do so. Because the reported time includes both file-directory
traversal and indexing, you shouldn't consider it an official performance measure.
In our example, each of the indexed files was small, but roughly two seconds to index a handful of text files is reasonably impressive.
Indexing speed is a concern, and we cover it in chapter 2. But generally, searching is of even greater importance.
Created: March 27, 2003
Revised: January 24, 2005
URL: http://webreference.com/programming/lucene/1

Find a programming school near you