spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / Lucene / 1 To page 1To page 2current pageTo page 4
[previous] [next]

Subject Matter Expert - Managed Services (PA)
Next Step Systems
US-PA-Wayne

Justtechjobs.com Post A Job | Post A Resume
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs


Lucene in Action: Meet Lucene Pt. 1

1.4.1 Creating an index

In this section you’ll see a single class called Indexer and its four static methods; together, they recursively traverse file system directories and index all files with a .txt extension. When Indexer completes execution it leaves behind a Lucene index for its sibling, Searcher (presented in section 1.4.2).

We don’t expect you to be familiar with the few Lucene classes and methods used in this example—we’ll explain them shortly. After the annotated code listing, we show you how to use Indexer; if it helps you to learn how Indexer is used before you see how it’s coded, go directly to the usage discussion that follows the code.

Using Indexer to index text files

Listing 1.1 shows the Indexer command-line program. It takes two arguments:

Listing 1.1 Indexer: traverses a file system and indexes .txt files

Interestingly, the bulk of the code performs recursive directory traversal (2). Only the creation and closing of the IndexWriter (1) and four lines in the indexFile method (3 4 5) of Indexer involve the Lucene API—effectively six lines of code.

This example intentionally focuses on text files with .txt extensions to keep things simple while demonstrating Lucene’s usage and power. In chapter 7, we’ll show you how to handle nontext files, and we’ll develop a small ready-to-use framework capable of parsing and indexing documents in several common formats.

Running Indexer

From the command line, we ran Indexer against a local working directory including Lucene’s own source code. We instructed Indexer to index files under the /lucene directory and store the Lucene index in the build/index directory:

Indexer prints out the names of files it indexes, so you can see that it indexes only files with the .txt extension.

When it completes indexing, Indexer prints out the number of files it indexed and the time it took to do so. Because the reported time includes both file-directory traversal and indexing, you shouldn’t consider it an official performance measure.

In our example, each of the indexed files was small, but roughly two seconds to index a handful of text files is reasonably impressive.

Indexing speed is a concern, and we cover it in chapter 2. But generally, searching is of even greater importance.

home / programming / Lucene / 1 To page 1To page 2current pageTo page 4
[previous] [next]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: March 27, 2003
Revised: January 24, 2005

URL: http://webreference.com/programming/lucene/1