The Meanings of XML: DTDs, DCDs and Schemas (2/3) - exploring XML
The Meanings of XML: DTDs, DCDs and Schemas
The shortcomings of DTDs
XML inherited DTDs from SGML. It has become more and more apparent that some shortcomings were also inherited:
- The syntax of a DTD is different from XML, requiring the document writer to learn yet another notation, and the software to have yet another parser.
- There is no way to specify datatypes and data formats that could be used to automatically map from and to programming languages
- There is neither a set of well-known basic elements to choose from.
To be fair these requirements were and are beyond the scope of DTDs but they need to be addressed should we hope to create a large set of XML processing tools.
DTD++: DCD, SOX, XML Schemas
XML-Data is Microsoft's first response to the problem. It was then refined with Netscape and IBM (yes, you can believe your eyes here) into Document Content Definition (DCD) to fit better with other XML efforts such as RDF. Another submission dubbed Document Definition Markup Language (DDML) comes from collecting ideas off the XML developer's mailing list.
Schema for Object-Oriented XML (SOX) is yet another alternative to XML DTDs that extends the language of DTDs by supporting:
- An extensive (and extensible) set of datatypes
- Inheritance among element types
- Polymorphic content
- Embedded documentation
- Features to enable robust distributed schema management.
All of these features are supported with strong type-checking and validation. A SOX schema is also a valid XML instance according to the SOX DTD, enabling the application of XML content management tools to schema management. This effort created quite some excitement in the software developer community because it maps closely to programming languages, especially Java.
All these initiatives from different organizations tried to tackle the same problems from different angles, so the W3C created the XML Schema working group in an effort to unite all these different submissions. XML Schema is divided into two parts, one for data types and one for expressing document structure.
The computer-old problem: Specifying semantics
One of the toughest problems in computer science is to declaratively express semantics, or in ordinary words: how to describe meaning other than by writing the code that performs the intended action. While some people have quite well mastered the use of inherently ambiguous human languages to write reasonably precise specifications (for instance the IETF RFCs and W3C RECOMMENDATIONS), code is still the ultimate specification.
A discussion of declarative semantics is beyond the scope of this article, but it is worth noting that the definition of basic data types and other predefined elements is a small but important step in this direction, with a possible reference piece of code like a Java class PurchaseOrder or a perl function formatDate() that adhere to the specification. With a reasonable definition written in prose many independent implementations might be created, as has happened with Web Servers thanks to the HTTP spec, Web browsers due to the HTML spec (which was arguably much weaker and less precise), and XML parser because of the XML spec.
Created: Mar. 31, 2000
Revised: Mar. 31, 2000