The Meanings of XML: DTDs, DCDs and Schemas (1/3) - exploring XML
The Meanings of XML: DTDs, DCDs and Schemas
So far in this column we ignored the more formal aspects of XML, such as defining the correct syntax and semantics of specific documents. While we examined the set of rules common to all documents (remember well-formed vs. valid documents?), I have so far neglected the mechanisms for specifying your own families of documents.
Specification of document syntax
In addition to the general rules every XML document has to adhere to, a Document Type Definition (DTD) or equivalent defines the order and nesting of elements that make up a certain type of XML document such as a purchase order or a news channel. DTDs have their roots in XML's parent SGML and have their own special syntax. Here is a shortened version of the DTD of the Rich Site Summary (RSS) format in use for news channels:
<!ELEMENT rss (channel)> <!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> --> <!ELEMENT channel (title | description | link | language | item+ | ...)*> <!ELEMENT title (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT link (#PCDATA)> <!ELEMENT image (title | url | link | width? | height? | description?)*> <!ELEMENT url (#PCDATA)> <!ELEMENT item (title | link | description)*> <!ELEMENT language (#PCDATA)> <!ELEMENT width (#PCDATA)> <!ELEMENT height (#PCDATA)>
The full version can be found at MyNetscape.This basically says that a channel is the only element inside an Resource Description Framework (RDF) node and is itself made up of title, description, link and several items in any order, besides other things. These items contain a title, a description, and a link, in any order. DTDs make use of special characters to denote orders of magnitude:
|a||exactly one a|
|a?||zero or one a's|
|a+||one or more a's|
|a*||zero or more a's|
|a | b||a or b|
|#PCDATA||parsed character data|
Using these basic blocks, complex definitions can be formed through nesting of elements, as shown in the RSS DTD above. HTML is also specified in a DTD, and so is XHTML, the reformulation of HTML in XML. You can conceptualize XHTML as an XML vocabulary for expressing screen documents, while RSS is a vocabulary for expressing a collection of links on a certain topic. The more vocabularies the computing industry can agree on, the more we can move on to doing the infrastructure work once and subsequently focus on the business problems at hand that are more important than data decoding and encoding. How many of your Web projects today have to deal with transforming data from one format to another?
Created: Mar. 31, 2000
Revised: Mar. 31, 2000