Charting the XML territory with XMLMap (1/2) - exploring XML | WebReference

Charting the XML territory with XMLMap (1/2) - exploring XML

Charting the XML territory with XMLMap

After three years of exploring the XML jungle, the time has come to add another type of map to this area. While the Column Trailmap gives an overview of the columns written for the XML section of WebReference, we will, over the next couple of installments, create a map of XML standards.

It will clearly be impossible to cover every single XML vocabulary ever invented, but a walktrough organized by topics should include the most relevant standards. Furthermore, exhaustive lists such as the Cover Pages are available elsewhere. We will organize the XMLMap into four groups:

In loose order we will expand the map into these different areas, and explore various aspects of them. Today, we'll start with the fundamentals of XML. Early on, the W3C attempted to lay the groundwork for XML for maximum compatibility and interoperability: with mixed results, as we shall see.

XML Fundamentals

Underlying everything is the specification of XML itself, with its definitions of well-formed and valid XML data, as well as clarification of character set issues. The initial 1.0 specification can be considerd a success, as only recently a 1.1 specification has been released, elaborating on some more character set issues that were appearing. No substantial changes have been made. They would be virtually impossible to make anyhow, as everything based on core XML would be affected.

The definition of XML Namespaces was the first challenge to the W3C and its standards creation process. Although the need to mix different XML vocabularies in one document was detected early, the fix was not released until much later in the form of XML Namespaces. While the solution is obvious, namely to prefix tags with an acronym that gets mapped to a globally unique ID, this caused widespread effects rippling through the other fundamental standards. While some, such as the XML linking effort, used them to their advantage, others still have not coped, like XML schemas, where the effect of mixing different declarations is still unclear, at least to me.

With the roots of XML in HTML, the next areas of XML infrastructure were fairly clear:

Linking of documents

Since one of the most important features of HTML is the hypertext, i.e. the linking between disparate texts, the W3C was quick to establish XLink as an effort to mimic these capabilities and fix the worst problem of the Web, the broken link. So XLink came up with various types of linking, most prevalent the HTML style href, inheriting the broken link problem. An alternative fixing the problem is the external link that resides in a separate document, external to both the source and the target document. Whereas this is a neat idea, it creates so many implementation and management problems that I have yet to see it implemented, especially on a system lacking central control such as the World-Wide Web.

With foresight that was amazingly absent in the styling department (next page), a separate effort was created for the more complex referencing of parts of an XML document, by the name of XPointer. While XLink only supports the XML id and name attributes, XPointer allows referencing by name, position and tag type. Here too, a complexity was created that hinders widespread implementation to date. Some observers proclaimed a clash between the camps of high-priced niche SGML and mass-market mainstream HTML.

XPath, yet another way to specify document parts, or more precisely nodes, grew out of XSL's (see styling department, next page) need to identify targets of XML document manipulation. While CSS is limited to referencing a node's name or class, XPath allows XSL to specify things like odd vs. even row in a list. XPath processors are implemented in XSLT processors.

Next are styles and schemas...

Produced by Michael Claßen

Created: Dec 09, 2002
Revised: Dec 09, 2002