XML Parser Comparison (2/2) - exploring XML
|Sun Project X||Microsoft MSXML||Oracle XML Parser for Java||James Clark XP|
|Win32||through Java||through Java||through Java||+||-|
|DOM Level 1 1.0||+||+||+||+||-|
|DOM Level 2 1.0||+||-||-||-||-|
Parsers are different not only in their support for checking and transforming documents but also in the way they read a document. Event-based parsers read the text sequentially, and whenever a start or end tag appears an event is sent to the application. The Simple API for XML (SAX) is such an API. With the second approach the parser builds a hierachical data structure from the content of the document. This is how the Document Object Model (DOM) API works, similar to the HTML document tree in a Web browser. You can find a more in-depth discussion of these programming models in column11.
The quality of a parser is largely defined by its conformance to the XML standard. A test suite has been defined by the Organization for Advancement of Structured Information Systems (OASIS). A set of over 1000 valid and invalid documents have been defined to check a parser's capabilities of accepting the valid ones and rejecting the invalid ones.
Only Sun's parser passes all of the tests. XML4J rejects some valid documents that contain UTF-16 special characters. This can create difficulties when processing foreign language documents. The Java version of XP parser is in early beta state but already passes almost all tests. The Oracle parser seems to be based on SGML code, it accepts some documents that are illegal in XML but are legal in the more flexible SGML definition. MSXML bails out sometimes too early when presented with illegal documents. Different sub-versions produce different results, of course. (See column20.)
An XML parser is the key building block for every XML application. Sun's parser and XP currently support the XML standard best. IBM's product is at the forefront of implementing new APIs. All of the above-mentioned parsers are good candidates to build your XML applications on, and when you use standard APIs such as SAX or DOM you should be able to swap different implementations in and out without changing a single line of code anyway. The most important thing is to start your XML project today!
Created: Oct 20, 2000
Revised: Oct 20, 2000