spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / xml / parser / 1 To page 1current pageTo page 3To page 4
[previous][next]

Effective XML: 50 Specific Ways to Improve Your XML

Oracle DBA (IL)
Next Step Systems
US-IL-Lombard

Justtechjobs.com Post A Job | Post A Resume
Developer News
OpenOffice 3.2 Lands Amid Critical Changes
Red Hat, IBM Firmly in KVM Virtualization Camp
Red Hat Talks Up Open Source Cloud Plans


C0 Item 37 Validate Inside Your Program with Schemas

Rigorously testing preconditions is an important characteristic of robust, reliable software. Schemas make it very easy to define the preconditions for XML documents you parse and the postconditions for XML documents you write. Even if the document itself does not have a schema, you can write one and use it to test the documents before you operate on them. It is quite hard to attach a DTD to a document inside a program. Fortunately, however, most other schema languages are much more flexible about this.

For example, let's suppose you're in charge of a system at TV Guide that accepts schedule information from individual stations over the Web. Information about each show arrives as an XML document formatted as shown in Example 37-1.

Example 37-1 |An XML Instance Document Containing a Television Program Listing

Every day, around the clock, stations from all over the country send schedule updates like this one that you need to store in a local database. Some of these stations use software you sold them. Some of them hire interns to type the data into a password-protected form on your web site. Others use custom software they wrote themselves. There may even be a few hackers typing the information into text files using emacs and then telnetting to your web server on port 80, where they paste in the data. There are about a dozen different places where mistakes can creep in. Therefore, before you even begin to think about processing a submission, you want to verify that it's correct. In particular, you want to verify the following.

  • The root element of the document is Program.
  • All required elements are present.
  • No more than one of each element is present.
  • The Title element is not empty.
  • The date is a legal date in the future.
  • The Start element contains a sensible time.
  • The duration looks like a period of time.
  • The station identifier is a four-letter code beginning with either K or W.
  • The station identifier maps to a known station somewhere in the country, which can be determined by looking it up in a database running on a different machine in your intranet.
You could write program code to verify all of these statements after the document was parsed. However, it's much easier to write a schema that describes them declaratively and let the parser check them. The W3C XML Schema Language, RELAX NG, and Schematron can all handle about 85% of these requirements. They all have problems with the requirement that the date be in the future and that the station be listed in a remote database. These will have to be checked using real programming code written in Java, C++, or some other language after the document has been parsed. However, we can make the other checks with a schema. Example 37-2 shows one possible W3C XML Schema Language schema that tests most of the above constraints.

Example 37-2 |A W3C XML Schema for Television Program Listings

For simplicity, I'll assume this schema resides at the URL http://www. example.com/tvprogram.xsd in the examples that follow, but you can store it anywhere convenient.

There are several different ways to programmatically validate a document, depending on the schema language, the parser, and the API. Here I'll demonstrate two: Xerces-J using SAX properties and DOM Level 3 validation.

home / programming / xml / parser / 1 To page 1current pageTo page 3To page 4
[previous][next]


The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers

webref The latest from WebReference.com Browse >
Search Engine Optimization: Selecting and Embedding Keywords · Are Google's Language Translation Web Services Ready for Prime Time? · Installing and Using Meeplace, the Business Review CMS
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Workers Say Telework Is More Productive, Bosses Not So Sure · Kingston Debuts Power-Saving Memory Upgrades · Social Networking is King: Facebook Edges Google

Created: March 27, 2003
Revised: October 25, 2003

URL: http://webreference.com/programming/xml/parser/1