XML: Whines and Battles - DTD's | WebReference

XML: Whines and Battles - DTD's

What’s a DTD and why do I need one (or not)?

One of the keys to XML (and SGML) is the Document Type Definition (DTD), the specification for a particular class of documents. A DTD says what types of elements are meaningful for his type of document, and how they fit together. DTDs are important because they provide a basis for creating and managing whole groups of documents of the same type in a uniform way. This improves the quality of your information, because at least one source of error or confusion has been removed.

HTML’s use of DTDs was fairly static (see ‘HTML DTDs’) but in XML it is different: you actively choose whether or not to use a DTD. If your editing software is smart enough (or if you are accurate enough), you can omit any DTD and fabricate the markup on the fly to suit the occasion. It just has to follow the rules in the XML specification for DTD-less markup, known as ‘standalone’ XML, so that browsers can read it without error. This is fairly straightforward:

HTML DTDs

For HTML, several different DTDs evolved (with associated specifications: 2.0, 3, 3.2, 4.0, Pro, etc), but they all defined broadly the same basic underlying markup, with the later ones adding more recent features.

Because current browsers are hard-wired to interpret only HTML markup and nothing else, silently ignoring markup they don’t recognize, HTML DTDs tend to be used only by software which performs full SGML checking of the markup (‘validation’), such as SoftQuad’s HoTMetaL editor, or other SGML-based formatting, conversion, or data management software.

If your pages don’t need checked HTML, or if they have to use the features of a particular editor which generates it, or if you need some private markup recognized by a specific browser, then the answer to the question is that you probably don’t want or need any DTD.

    For DTD-less XML
  • All elements must have start-tags and end-tags even if there’s nothing between them (you can’t omit things like </P>!).
  • In the special case of truly empty elements (i.e., those that do not possess an end-tag at all) you can use the special form of abbreviation if you wish: a start tag with a trailing slash, like <BR/>).
  • Elements cannot overlap (same rule as for HTML): they must be nested inside one another (so <B>bold <I>italic</B></I> is an error: it must be <B>bold <I>italic</I></B>).
  • You must put all attribute values in quotes, (e.g., <link to="http://www.foo.bar/">) and you can’t have default values or automated type-recognition like NUMBER or ID.
  • You must use &lt; and &amp; for < and &.
  • You have to tell XML processors that they should not expect a DTD anywhere, and if there’s any special processing or formatting, you need to provide a stylesheet:

<?xml version="1.0" standalone="yes"?>
<?xml-style href="quickmessage.xsl" type="text/xsl"?>
<message stamp="1998-08-18T11:32:45.26+0000">
<to address="mike@foo.com">Mike</to>
<text>Are you free for lunch at 1.00pm today?</text>
<from address="pete@foo.com">Peter</from>
</message>

But if you are going to create the same type of document again and again (and most people do seem to want to do this), it’s a lot easier if you use a standard structure (and vary the appearance with a stylesheet if you wish). A DTD provides this structure.

But you’re not constrained to using an existing DTD: you can write your own, which is why so many groups of potential users are busy producing them. The example above represents a DTD which would look something like this:

<!ELEMENT message (to,text,from)>
<!ATTLIST message stamp CDATA #REQUIRED>
<!ELEMENT to (#PCDATA)>
<!ATTLIST to address CDATA #REQUIRED>
<!ELEMENT text (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ATTLIST from address CDATA #REQUIRED>

The declaration for each element type provides an expression which models what it may contain: usually either more element types (here ‘to,text,from’), or Parsed Character Data (text), or possibly a mixture. Attribute lists are declared giving an attribute type and default status (required, implied, etc, or an actual default value). Any XML system using this therefore ‘knows’ which elements go where and can use their names for matching with a style in a stylesheet, or performing a search, or controlling your editor.

Comments are welcome

http://www.internet.com


All Rights Reserved. Legal Notices.
Created: May 11, 1998
Revised: May 14, 1998

URL: http://www.webreference.com/authoring/xml/dtd.html