XML: Whines and Battles - DTD's
Whats a DTD and why do I need one (or not)?
One of the keys to XML (and SGML) is the Document Type Definition (DTD), the specification for a particular class of documents. A DTD says what types of elements are meaningful for his type of document, and how they fit together. DTDs are important because they provide a basis for creating and managing whole groups of documents of the same type in a uniform way. This improves the quality of your information, because at least one source of error or confusion has been removed.
HTMLs use of DTDs was fairly static (see HTML DTDs) but in XML it is different: you actively choose whether or not to use a DTD. If your editing software is smart enough (or if you are accurate enough), you can omit any DTD and fabricate the markup on the fly to suit the occasion. It just has to follow the rules in the XML specification for DTD-less markup, known as standalone XML, so that browsers can read it without error. This is fairly straightforward:
For HTML, several different DTDs evolved (with associated specifications: 2.0, 3, 3.2, 4.0, Pro, etc), but they all defined broadly the same basic underlying markup, with the later ones adding more recent features.
Because current browsers are hard-wired to interpret only HTML markup and nothing else, silently ignoring markup they dont recognize, HTML DTDs tend to be used only by software which performs full SGML checking of the markup (validation), such as SoftQuads HoTMetaL editor, or other SGML-based formatting, conversion, or data management software.
If your pages dont need checked HTML, or if they have to use the features of a particular editor which generates it, or if you need some private markup recognized by a specific browser, then the answer to the question is that you probably dont want or need any DTD.
- All elements must have start-tags and end-tags even if
theres nothing between them (you cant omit things like
- In the special case of truly empty elements
(i.e., those that do not possess an end-tag at all)
you can use the special form of abbreviation if you wish: a start tag
with a trailing slash, like
- Elements cannot overlap (same rule as for HTML): they
must be nested inside one another (so
<B>bold <I>italic</B></I>is an error: it must be
- You must put all attribute values in quotes, (e.g.,
<link to="http://www.foo.bar/">) and you cant have default values or automated type-recognition like
- You must use
- You have to tell XML processors that they should not expect a DTD anywhere, and if theres any special processing or formatting, you need to provide a stylesheet:
<?xml version="1.0" standalone="yes"?> <?xml-style href="quickmessage.xsl" type="text/xsl"?> <message stamp="1998-08-18T11:32:45.26+0000"> <to address="email@example.com">Mike</to> <text>Are you free for lunch at 1.00pm today?</text> <from address="firstname.lastname@example.org">Peter</from> </message>
But if you are going to create the same type of document again and again (and most people do seem to want to do this), its a lot easier if you use a standard structure (and vary the appearance with a stylesheet if you wish). A DTD provides this structure.
But youre not constrained to using an existing DTD: you can write your own, which is why so many groups of potential users are busy producing them. The example above represents a DTD which would look something like this:
<!ELEMENT message (to,text,from)> <!ATTLIST message stamp CDATA #REQUIRED> <!ELEMENT to (#PCDATA)> <!ATTLIST to address CDATA #REQUIRED> <!ELEMENT text (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ATTLIST from address CDATA #REQUIRED>
The declaration for each element type provides an
expression which models what it may contain: usually either more
element types (here
to,text,from), or Parsed
Character Data (text), or possibly a mixture. Attribute lists are
declared giving an attribute type and default status (required,
implied, etc, or an actual default value). Any XML
system using this therefore knows which elements go where and
can use their names for matching with a style in a stylesheet, or
performing a search, or controlling your editor.
Comments are welcome
All Rights Reserved. Legal Notices.
Created: May 11, 1998
Revised: May 14, 1998