SAX and DOM and Rock'n Roll (2/4) - exploring XML
SAX and DOM and Rock'n Roll
The object-driven style: DOM
While SAX is a fairly simple and elegant solution to processing XML, it puts the burden of storing and manipulating the XML data on the programmer. While this can be an advantage under certain circumstances, as we will see later, it is certainly not the most convenient way to deal with XML.
The W3C realized early on that a Document Object Model would be helpful for XML in the same way it proved to be useful for HTML, especially in its scripting-enabled version, Dynamic HTML. An XML-enabled browser could parse the incoming XML and build an in-memory object structure from it that could then be manipulated through your favorite scripting language. Establishing this standard early on could avoid the pain and hassle we have to live with in the HTML world, where browsers rarely conform 100 percent to the HTML DOM specification. Luckily, both Internet Explorer, albeit with inevitable Microsoft extensions, and the new Mozilla browser seem to now zero in on a standards-compliant DOM implementation.
The Multi-Level DOMThe inital DOM described only a few methods - for instance, a method to access an identifier by name, or by a particular link. Functionality equivalent to that included in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0 is referred to as "level 0". Building on this existing technology, more levels of DOM have been defined or are in the process of definition:
- Level 1:
- This concentrates on the actual core document models, applicable to both XML and HTML, but most specifically HTML. It contains functionality for document navigation and manipulation.
- Level 2:
- At Working Draft stage, it includes a style sheet object model, and defines functionality for manipulating the style information attached to a document. It also enables filters on the document, defines an event model and provides support for XML namespaces.
- Level 3:
- In the requirements gathering phase, it will address document loading and saving, as well as content models (such as DTDs and schemas) with document validation support. In addition, it will also address document views and formatting, key events and event groups.
- Level 4 and higher:
- These may specify some interface with a possible underlying window system, including some ways to prompt the user. They may also contain a query language interface, and address multithreading and synchronization, security, and repository.
DOM level 1
This level defines a minimal set of objects and interfaces for accessing and manipulating document objects. The functionality specified here (the Core functionality) should be sufficient to allow software developers and Web script authors to access and manipulate parsed HTML and XML content inside conforming products. The DOM Core API also allows population of a document object using only DOM API calls; creating the skeleton document and saving it persistently is left to the product that implements the DOM API.
The DOM presents documents as a hierarchy of node objects that also implement other, more specialized interfaces. Some types of nodes may have child nodes of various types, and others are leaf nodes that cannot have anything below them in the document structure. The node types, and which node types they may have as children, are as follows:
- Document: (one) Element, ProcessingInstruction, Comment, DocumentType
- DocumentFragment: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
- DocumentType: no children
- EntityReference: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
- Element: Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference
- Attr: Text, EntityReference
- ProcessingInstruction: no children
- Comment: no children
- Text: no children
- CDATASection: no children
- Entity: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
- Notation: no children
The DOM also specifies a NodeList interface to handle ordered lists of nodes, such as the children of a node. A NamedNodeMap interface also exists to handle unordered sets of nodes referenced by their name attribute, such as the attributes of an element.
DOM Level 2
DOM Level 2 builds on level 1. It adds new platform-independent and language-independent interfaces for the following:
- Handling namespaces
- Handling document events
- Traversing the document
- Exposing generic and CSS style scripts in the document
- Representing views of the document
Created: Apr. 18, 2000
Revised: Apr. 26, 2000