spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / jakarta To page 1current page
[previous]

Sr Instructional Designer D2L-Moodle,Clearance
WSI Nationwide, Inc.
US-NJ-Fort Monmouth

Justtechjobs.com Post A Job | Post A Resume
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs


Jakarta Commons Online Bookshelf: XML parsing with Digester. Part 1

The code in listing 4.4 creates a simple Digester instance and uses it to parse the code listed in listing 4.1 for the XML file called book_4.1.xml. Let’s examine the steps involved in doing so:

(1) A new Digester instance is created by a call to the constructor. In this instance, the constructor takes no arguments and defaults to using the basic properties. The alternate constructors, Digester(javax.xml.parsers.SAXParser parser) and Digester(org.xml.sax. XMLReader reader), let you specify the parser or the reader used by Digester internally. The empty constructor uses the Java API for XML Processing (JAXP) parsers and readers by default. This is suitable in most cases, except where there is a conflict with the underlying environment as reported in BEA WebLogic application server.

Note: The Digester class extends the org.xml.sax.helpers.DefaultHandler class; this implies that Digester uses SAX2 to parse XML documents and handle callbacks.

(2) By setting this Digester instance to be nonvalidating, we make the Digester instance bypass validation of the input XML file against a schema. The schema location must be specified in the XML data file that is being parsed. In our case, the validation flag is set to false, so the validation isn’t done. What happens if the validation is set to true, and a schema isn’t properly specified in the XML data file? Well, you’ll get parse exceptions during runtime for each and every element in your XML data file, because Digester defaults to using the W3C_XML_SCHEMA. Validation is turned off by default, so you can bypass this step if you don’t want validation of your XML data.

(3) Here we’re creating a rule that specifies the creation of a Book object. When will this Book object be created? It doesn’t matter—at least, it doesn’t matter for the rule creation itself. Logically, we want this rule to be followed or executed when a corresponding Book element is encountered within the XML data file, but we’ll deal with that later. Notice the syntax of the ObjectCreateRule constructor; when creating this rule, you need to specify the fully qualified name of the class to be instantiated.

(4) Now we add the rule created in step 3 to the Digester instance created in step 1. The syntax of this method should be clear: digester.addRule(String pattern, Rule rule) specifies that when the pattern specified by the first parameter is found in the XML data file, the rule specified by the second parameter should be executed. In this case, the pattern is "book", and the rule is the ObjectCreateRule listed in step 3. In a nutshell, the Digester is now set to create an object of type com.manning.commons.chapter04.Book whenever it encounters the pattern "book" in an XML data file.

(5) The actual parsing is encapsulated in a try-catch block. Parsing of XML documents throws two types of errors, and as shown in this example, they’re caught separately. The IOException is for errors that relate to problems while reading the XML data document. The SAXException is thrown for all errors relating to parsing of the XML document itself.

Having specified the rules and patterns in step 4, it’s time to parse the XML document. This is done by calling Digester’s parse method and passing in the location of the file to be parsed. There are several variants of this method, allowing for different input sources (input stream, reader, URI, and file). All these methods return the top-level object created, which in this case is a Book object.

We print to the screen the final created Book object’s toString method. If you run this example, you’ll see the following on the screen:

Title: null

This is the expected result. It proves that Digester created the Book object successfully after reading the example XML data file, although the correct title wasn’t set.

In our SimpleDigester code, all we wanted to do was to provide an example of the way Digester is run. We’ll extend this simple example later to make sure not only that the Book object is created properly, but that its title and authors are printed as well.

Before we move on, let’s examine the following statement more carefully from listing 4.4:

Book book = (Book)digester.parse("book_4.1.xml");

We said earlier that this statement returns the top-level object created. Since it’s possible that several objects may be created in the course of parsing of an XML document, Digester maintains a stack of these objects. The stack operates on the principle of last-in-first-out (LIFO). Internally, the stack is maintained using org.apache.commons.collections.ArrayStack, which is a utility class in the Jakarta Collections library. (We’ll talk more about this class in module 7, “Enhancing Java core libraries with Collections.”) If you’re familiar with how a stack operates, you know that objects are pushed to the bottom of the stack by newer objects that come in. This way, the topmost object in the stack is more recent than the others that were put in the stack, which is the basic idea of a LIFO list. This has a strong impact on your understanding of Digester. Why? Because as the beginning of a newer element in the XML data file that matches a rule for object creation is encountered, the Digester stack fills with this newer object on top of another object whose end tag hasn’t yet been encountered. These objects are popped off the stack only when their corresponding end tag is encountered. Thus when the parsing of an XML document is finished, technically, the only reference available at that point is the last object created. This may or may not be the desired effect. Consider listing 4.1, the simple XML data file. Two distinct objects can be created by Digester’s parsing of this file. One is the Book object, which we created in listing 4.4. The second is the Author object, which we didn’t create. Let’s assume that we add another rule to listing 4.4 to create this Author object as well, as shown here:

Rule objectCreateAuthor =
new ObjectCreateRule("com.manning.commons.chapter04.Author");
digester.addRule("book/author", objectCreateAuthor);

Run the code in listing 4.4 again after you add these lines. It doesn’t matter whether you add them before or after the corresponding Book creation lines. The output on the screen remains the same:

Title: null

In a way, this contradicts the operation of a stack. If the code to add the Author rule was after the corresponding Book rule, and therefore the Author object should have been the last object on the stack, the Book object on the stack should have been pushed down, and we should have gotten a ClassCastException. We didn’t. There are two reasons.

First, the objects are pushed on the stack in the order they’re created and not when the rule is added. The book element precedes the author element in the XML data file. The rules associated with the book element are fired before the corresponding rules for the author. Thus a Book object is created before an Author object, regardless of when the code to create an Author is specified. Second, and more important, Digester maintains a handle to the first object pushed into the stack and returns it at the end of a parse operation. Since, in this case, Book is the first object pushed in the stack, Digester returns this referenced object at the end of the parse operation.

There are very few reasons why you would want to manipulate the Digester stack yourself. You might want to preset the stack with objects that were created separately from the XML data file you want to parse. For example, suppose you had your own instance of the Book object. Instead of adding a rule to create the Book object based on the XML data file, you could push this Book object onto the stack before any parsing was done, as shown here:

Book book = new Book();
book.setTitle("My book");
digester.push(book);

This lets you preempt the book element in the XML file. When the final parse is complete, the referenced object that is returned is this Book object and not the one specified in the XML data file. Of course, the rest of the elements are created as they’re specified in the XML.

In addition to push(Object object), you can perform three other operations with the stack:

  • clear() clears the contents of the stack.

  • peek() returns the next object on the stack without removing it.

  • pop() removes the next object on the stack.

These operations are useful when you create your own rules.

4.3 A match made in heaven?

Like a real-world dating agency, Digester matches elements in an XML file based on patterns specified by the user. However, unlike real-world dating, the Digester supports wildcard matching.

When the parse method of the Digester is called on an XML file, the Digester starts to traverse the element hierarchy contained within the XML file. This means it examines every element and looks up a List to see if any rules are associated with this element. This examination is done from the top-level element to the innermost nested element in the order they’re listed in the XML file. Once a match is found, all rules associated with that match are fired in the order that they were registered in the first place.

The element matching roughly follows the XPath naming convention. However, unlike in XPath, there are no relative paths—only absolutes. Each path/pattern starts with the root element, and the root element must be specified. Using listing 4.1 as an example, the following element matching patterns are valid: book, book/author, and book/author/name.

There is an exception to this rule, which relates to wildcard patterns: You can use the wildcard character (*) to match more than one pattern. For example, the pattern book/*/name matches the names of all authors, if any, for listing 4.1. With this pattern, you can also match the name element for all elemental depths. So, the wildcard isn’t only a wildcard for elements; it also substitutes for the depths of nesting within an XML document. Continuing with the book/*/name example, this pattern matches all patterns where a name element follows immediately after the book element, as well as book/author/name, book/author/dog/name, book/publisher/name, and so on. The wildcard can be substituted for the root character, as well. Thus you can use */name, which matches all name elements that appear at any depth within the document.

XPath

XPath is a W3C standard for describing the contents of an XML file in a universal way. It specifies a set of rules that let you define any element in an XML file. It’s like a tree of files and folders and is similar to the way you describe files and folders on your computer. Thus you can use an XPath structure like /usr/tomcat/bin to describe elements in an XML file. For more details about XPath, visit http://www.w3cschools.com; or see module 5, “JXPath and Betwixt: working with XML,” where it’s discussed in section 5.2.1.

Vikram Goyal wrote the only online series of articles covering Jakarta Commons and regularly writes how-to articles on open source projects. He is a Sun Certified Programmer for the Java 2 Platform living in Brisbane, Australia. Vikram is the author of Jakarta Commons Online Bookshelf, Beginning JSP 2.0, and Professional JSP Site Design.

Written by Vikram Goyal and reproduced from "Jakarta Commons Online Bookshelf" by permission of Manning Publications Co. ISBN 1932394524, copyright 2005. All rights reserved. See http://www.manning.com/goyal for more information.

home / programming / jakarta To page 1current page
[previous]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: March 27, 2003
Revised: May 2, 2005

URL: http://webreference.com/programing/jakarta/