HTML Unleashed. The Emergence of XML: Well-Formed XML Documents
HTML Unleashed: The Emergence of XML
Well-Formed XML Documents
f you're scared by the prospect of learning the art of writing document type definitions, there's a good news for you: with XML, you can create a document even without DTD. If the lack of a DTD is the sole violation of XML requirements, such a document is called well-formed, as opposed to a valid document that has a DTD attached (or referred to). Thus, well-formedness is the lower of the two levels of XML conformance, but although it is inferior to validness, it is still very useful.
The permission to omit the DTD means, in essence, that you are free to use any tags that seem necessary for your document. You may just go wild and write in plain English what each part of your text is supposed to represent---for example, if you're a grammarian:
<SENTENCE> <SUBJECT TYPE="COMPLEX"> <ARTICLE TYPE="INDEFINITE">A</ARTICLE> <ADJECTIVE>quick</ADJECTIVE> <ADJECTIVE>brown</ADJECTIVE> <NOUN>fox</NOUN> </SUBJECT> <VERB TYPE="INTRANSITIVE">jumps</VERB> <PREPOSITION>over</PREPOSITION> <ARTICLE TYPE="INDEFINITE">a</ARTICLE> <OBJECT> <ADJECTIVE>lazy</ADJECTIVE> <NOUN>dog</NOUN> </OBJECT> </SENTENCE>
Given this well-formed document, an XML parser will be able to break the text into the elements that you deemed essential in this case and pass each element to the application along with its name (derived from the tag name) and attribute information you provided. It is the application, not the XML parser, that must be programmed to perform some useful tasks with this structured information. Most often, you'll need to provide the application with a style sheet that associates certain formatting parameters with each element (in our example, the style sheet might specify displaying different parts of speech in different colors).
In fact, even the "plain English" mentioned above is not an obligatory requirement for your tags. Creators of XML intended that the language must be international from the very beginning. They painstakingly identified all Unicode characters that may be called "letters" in some sense or in some language and included these characters into the set of characters allowed in element and attribute names. This means that you can write your tags in Russian or Chinese instead of English.
There are, of course, numerous restrictions that are imposed even on well-formed documents. Some of these requirements are even stricter than those of HTML and thus deserve special attention:
In fact, the preceding requirements are the only ones that you must satisfy to make your HTML files well-formed XML. It doesn't matter which browser's HTML extensions you use or whether you "abuse" HTML tags or not. XML is a truly liberal language; it makes you a creator of your own universe whose rules you're unlikely to break simply because it's you who establishes them.
Those familiar with the material in Chapter 3 may be wondering, what about another essential part of any SGML application, the SGML declaration that always accompanies a DTD? Because XML is pretty much simplified compared to SGML, its authors decided to omit this part of the language; XML parsers (or SGML software processing XML documents) should behave as if all XML documents were assigned the same generic SGML declaration that is listed in an appendix to the XML standard.
Revised: Jun. 16, 1997