SAX and DOM and Rock'n Roll (3/4) - exploring XML
SAX and DOM and Rock'n Roll
SAX vs. DOMThe DOM is a quite convenient way to access and manipulate XML data, but it comes with a price to pay:
- The underlying XML needs to be fully parsed before processing can occur. As most DOM implementations are purely memory-based, this limits the amount of XML data that can be processed this way. Also the possiblity for pipelining various stages of processing is limited.
- The DOM structure only defines generic nodes, whereas most languages are strongly typed so that one might want to map specific nodes to specific classes of say Java code.
SAX is more useful in cases where:
- Huge amounts of XML need to be processed, but the information needed is highly local, meaning only a small amount of data needs to be stored. This is usually the case in transforming linear documents, where little cross-linking occurs.
- Various stages of XML processing are interconnected to form a pipeline. The next stage can then begin its work as soon as the first character comes out of the previous one, instead of having to wait for the full document to be converted into objects.
Some projects have tried to improve on DOM in this respect. The Java community in particluar has looked closely at ways to tie specific XML elements to specific Java classes. Popular efforts include:
XParse for Java
Without going into too much detail the key feature of a JSArray is, not surprisingly, the ability to hold a set of objects referenced by indices. Furthermore, it is possible to directly address the contained object's properties by specifying array[index].property. This behavior is approximated by checking for specific object types and property names and then "manually" setting the property on the object. A fully generic implementaion would be feasible through the use of Java's reflection capabilities. Another powerful feature is the ability to split a string into an array using a certain delimiter within the string; similarly, in reverse, joining an array of strings back into one including a potentially different delimiter string. If you are familiar with perl this feature should sound familiar. See the source code with comments if you like.
Created: Apr. 18, 2000
Revised: Apr. 26, 2000