Schema Wars: XML Schema vs. RELAX NG (2/2) - exploring XML
Schema Wars: XML Schema vs. RELAX NG
More XML Schema Problems
- Datatype handling in W3C XML Schema lacks modularity.
W3C XML Schema is tied to the single collection of datatypes defined in Part 2 of W3C XML Schema. Yet this collection of datatypes is a very ad-hoc collection. It includes datatypes of highly debatable relevance (gYearMonth, gDay etc). Yet it lacks many datatypes that are important for many applications. A modular approach where a schema language can be combined with one or more standard collections of datatypes, some general-purpose and some domain-specific, is called for here.
- W3C XML Schema does not define a single notion of
validity of a document with respect to a schema.
There are different varieties of validation (lax and strict) and many different ways to validate a document against a schema. From a W3C XML Schema alone, it is not possible to know what is a valid document. For instance there is no way to specify what is allowed as the root element.
- Magic schema attributes in documents
W3C XML Schema provides the
xsi:schemaLocationattribute, which allows an XML document instance to indicate the schema that should be used to validate the document. This creates problems with security (the destination might have changed or tampered with), interoperability (use of
schemaLocationis optional) and "purity" of schema definition: There is no way to prevent the document containing magic
xsi:*attributes, so the use of W3C XML Schema "infects" the grammar you are defining.
- Another problematic area in W3C XML Schema is the support for
infoset augmentation, such as default attributes.
Apart from being a violation of modularity, this tends to cause interoperability problems, because it leads to the possibility of the application getting different information depending on whether or not validation has been performed.
With these problems identified, Clark proposes the consideration of RELAX NG, designed under his lead by OASIS. RELAX NG merges Murata Makoto's RELAX and Clark's TREX. It is a simple, yet elegant evolution of the DTD, emphasizing ease-of-use, modularity, and focus on validation. It does not modify the infoset in the process of validation and avoids the problems of XML Schema listed above. RELAX NG is also part of an ISO draft standard, ISO/IEC DIS 19757-2.
More specifically RELAX NG improves on the XML Schema problems outlined above by providing:
- A concise definition.
- A well-written specification.
- Grammar-like specification of constraints between elements and attributes.
- RELAX NG provides an operator "
&" to specify unorderd alternatives.
- Data types are not part of RELAX NG, but can be specified in a modular fashion.
- Root elements are precisely defined.
- No references to a schema can be included in a document.
- No augmentation of a document's infoset through the validation process.
RELAX NG schemas were originally written in XML, but a compact non-XML syntax is also provided, which resembles a context-free grammar. This non-XML syntax provides a familiar view of the language that is comprehensible, and digestible by a parser.
The IETF's RFC on the use of XML and its statement of the recommended use of XML Schema created some resistance in the XML community. Section 1.2 of RFC 2026 states that two of the goals of the Internet Standards Process are: first, technical excellence; and second, clear, concise and easily understood documentation. Clark challenges that RELAX NG beats XML Schema in both categories and invites everybody who disagrees to compare the RELAX spec with the XML Schema spec. I tend to agree with him.
Produced by Michael Claßen
Created: Jul 08, 2002
Revised: Jul 08, 2002