HTML Unleashed. The Emergence of XML: Well-Formed XML Documents | WebReference

HTML Unleashed. The Emergence of XML: Well-Formed XML Documents


HTML Unleashed: The Emergence of XML

Well-Formed XML Documents


If you're scared by the prospect of learning the art of writing document type definitions, there's a good news for you: with XML, you can create a document even without DTD.  If the lack of a DTD is the sole violation of XML requirements, such a document is called well-formed, as opposed to a valid document that has a DTD attached (or referred to).  Thus, well-formedness is the lower of the two levels of XML conformance, but although it is inferior to validness, it is still very useful.

The permission to omit the DTD means, in essence, that you are free to use any tags that seem necessary for your document.  You may just go wild and write in plain English what each part of your text is supposed to represent---for example, if you're a grammarian:


Given this well-formed document, an XML parser will be able to break the text into the elements that you deemed essential in this case and pass each element to the application along with its name (derived from the tag name) and attribute information you provided.  It is the application, not the XML parser, that must be programmed to perform some useful tasks with this structured information.  Most often, you'll need to provide the application with a style sheet that associates certain formatting parameters with each element (in our example, the style sheet might specify displaying different parts of speech in different colors).

In fact, even the "plain English" mentioned above is not an obligatory requirement for your tags.  Creators of XML intended that the language must be international from the very beginning.  They painstakingly identified all Unicode characters that may be called "letters" in some sense or in some language and included these characters into the set of characters allowed in element and attribute names.  This means that you can write your tags in Russian or Chinese instead of English.

There are, of course, numerous restrictions that are imposed even on well-formed documents.  Some of these requirements are even stricter than those of HTML and thus deserve special attention:

  • In XML, every start tag must have a corresponding end tag (unless it is an empty tag that takes special form, as described in the next item).  Tags should be properly nested; that is, you can't have an open tag within the scope of some other tag and the corresponding end tag outside that scope.

  • If a tag is empty by its nature (in other words, the corresponding element can never contain any text), it must have a forward slash (/) before the closing greater than symbol (>), for example:
    <IMG alt="XML logo" 

    Such tags are the only tags that do not have corresponding end tags.

  • All attribute values without exception must be enclosed in quotes (either single quotes ('') or double quotes ("")).

In fact, the preceding requirements are the only ones that you must satisfy to make your HTML files well-formed XML.  It doesn't matter which browser's HTML extensions you use or whether you "abuse" HTML tags or not.  XML is a truly liberal language; it makes you a creator of your own universe whose rules you're unlikely to break simply because it's you who establishes them.

Those familiar with the material in Chapter 3 may be wondering, what about another essential part of any SGML application, the SGML declaration that always accompanies a DTD? Because XML is pretty much simplified compared to SGML, its authors decided to omit this part of the language; XML parsers (or SGML software processing XML documents) should behave as if all XML documents were assigned the same generic SGML declaration that is listed in an appendix to the XML standard.


Created: Jun. 15, 1997
Revised: Jun. 16, 1997