|
f
you're scared by the prospect of learning the art of writing
document type definitions, there's a good news for you: with XML,
you can create a document even without DTD. If the lack of a DTD is
the sole violation of XML requirements, such a document is called
well-formed, as opposed to a valid document that has
a DTD attached (or referred to). Thus, well-formedness is the lower
of the two levels of XML conformance, but although it is inferior to
validness, it is still very useful.
The permission to omit the DTD means, in essence, that you are free to
use any tags that seem necessary for your document. You may
just go wild and write in plain English what each part of your text is
supposed to represent---for example, if you're a grammarian:
<SENTENCE>
<SUBJECT TYPE="COMPLEX">
<ARTICLE TYPE="INDEFINITE">A</ARTICLE>
<ADJECTIVE>quick</ADJECTIVE>
<ADJECTIVE>brown</ADJECTIVE>
<NOUN>fox</NOUN>
</SUBJECT>
<VERB TYPE="INTRANSITIVE">jumps</VERB>
<PREPOSITION>over</PREPOSITION>
<ARTICLE TYPE="INDEFINITE">a</ARTICLE>
<OBJECT>
<ADJECTIVE>lazy</ADJECTIVE>
<NOUN>dog</NOUN>
</OBJECT>
</SENTENCE>
Given this well-formed document, an XML parser will be able to break
the text into the elements that you deemed essential in this case and
pass each element to the application along with its name (derived from
the tag name) and attribute information you provided. It is the
application, not the XML parser, that must be programmed to perform
some useful tasks with this structured information. Most often, you'll
need to provide the application with a style sheet that associates
certain formatting parameters with each element (in our example, the
style sheet might specify displaying different parts of speech in
different colors).
In fact, even the "plain English" mentioned above is not an obligatory requirement for
your tags. Creators of XML intended that the language must be
international from the very beginning. They painstakingly identified
all Unicode characters that may be called "letters" in some sense or
in some language and included these characters into the set of
characters allowed in element and attribute names. This means that you
can write your tags in Russian or Chinese instead of English.
There are, of course, numerous restrictions that are imposed even on
well-formed documents. Some of these requirements are even stricter
than those of HTML and thus deserve special attention:
In fact, the preceding requirements are the only ones that you
must satisfy to make your HTML files well-formed XML. It doesn't
matter which browser's HTML extensions you use or whether you "abuse"
HTML tags or not. XML is a truly liberal language; it makes you a
creator of your own universe whose rules you're unlikely to break
simply because it's you who establishes them.
Those familiar with the material in Chapter 3 may be wondering, what
about another essential part of any SGML application, the SGML
declaration that always accompanies a DTD? Because XML is pretty much
simplified compared to SGML, its authors decided to omit this part of
the language; XML parsers (or SGML software processing XML documents)
should behave as if all XML documents were assigned the same generic
SGML declaration that is listed in an appendix to the XML standard.
|