XML Schemas (1/2) - exploring XML
Now that the XML Schema specification is one step away from becoming a W3C Recommendation, it is a good time to take a closer look at the new improved way to declare document type definitions.
As mentioned in column10, DTDs have a number of limitations:
- The syntax of a DTD is different from XML, requiring the document writer to learn yet another notation, and the software to have yet another parser
- There is no way to specify datatypes and data formats that could be used to automatically map from and to programming languages
- There is not a set of well-known basic elements to choose from
DTDs were inherited by XML from its predecessor SGML, and were a good way to get XML started off quickly and give SGML people something familiar to work with. Nevertheless it became soon apparent that a more expressive solution that itself uses XML was needed.
Defining an element specifies its name and content model, meaning attributes and nested elements. In XML Schemas, the content model of elements is defined by their type. An XML document adhering to a schema can then only have elements that match the defined types. One distinguishes simple and complex types.
A number of simple types are predefined in the specification, such as
A simple type cannot contain elements or attributes in its value, whereas
complex types can specify nesting of elements and associations of attributes with an
A simple example could look like this:
<element name="quantity" type="positive-integer"/>
<element name="amount" type="decimal"/>
User-defined elements can be formed from the predefined ones using the object-oriented concepts of aggregation and inheritance. Aggregation groups a set of existing elements into a new one. Inheritance extends an already defined element so that it could stand in for the original.
Defining values like
derived from decimals:
<complexType base='decimal' derivedBy='extension'>
<attribute name='unit' type='string'/>
Aggregating time and value into a measurement:
<time>2000-10-08 12:00:00 GMT<time/>
The resulting schema definition:
<element name='measurement' type='measurement'/>
<element name='time' type='time'/>
<element name='value' type='value'/>
The equivalent, less expressive DTD:
<!ELEMENT measurement (time, value)>
<!ELEMENT time (#PCDATA)>
<!ELEMENT value (#PCDATA)>
<!ATTLIST value (unit)>
Inheritance features known from Java and other object-oriented languages are also present: You can declare a class "abstract" to force an inherited implementation, and declaring a class "final" prevents further subclassing. This way a one-to-one mapping between an element definition and a Java, C++, or Python class becomes feasible. Imagine a software system where the code to process a document is shipped with the document itself.
Let's look at cardinalities and namespaces.
Created: Oct 08, 2000
Revised: Oct 08, 2000