spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / experts / dlab / html unleashed
Developer News
Get Ready for Microsoft's 'Oslo' Modeling Tool
Latest Linux Hits Networking Flaws
Metasploit 3.2 Offers More 'Evil Deeds'
 

HTML Unleashed: The Emergence of XML

XML DTDs and Valid XML Documents

 
 

Although in many cases well-formed XML documents are sufficient for practical purposes, designing a DTD for your document has a number of advantages:

  • First and foremost, a DTD allows an XML parser to validate your document (that is why such documents are called valid).  When validating, the parser checks for misspelled tags or attributes, for errors in types of attribute values and in elements' content models, and so on.  For HTML, similar validation services exist that will check your file against one of the existing HTML DTDs.

  • For human reader, a DTD is a convenient way to quickly learn the structure of the particular type of documents.  Compared to SGML, the simplified DTD syntax of XML is very straightforward and unambiguous.

  • With DTD, you can define not only elements and their attributes, but also entities.  (See "Entity Declarations," later in this chapter.) Similarly to macros in word processors or #define preprocessor instructions in C, entities can be used to abbreviate text strings and markup instructions in an obvious and easy-to-modify manner.  Also, you can use external entities to refer to other XML documents, DTDs, or binary data located in separate files.
 
 
 

Accessing the DTD

 
 

Let's examine an example of a valid XML document, namely a play by Shakespeare (The Tempest) marked up by Jon Bosak, one of the authors of XML.  The package includes, besides the XML document and its DTD, a DSSSL style sheet that contains formatting instructions for each element and a Postscript output of a DSSSL processor that formatted the play.

Here's the very beginning of the XML document play.xml:

<?XML version="1.0"?>
<!DOCTYPE play PUBLIC
       "-//Free Text Project//DTD Play//EN">

The first line here is an XML declaration, a special instruction that is XML-specific and would be ignored by an SGML parser.  Here, the XML declaration provides information about the version of XML standard that the document conforms to.

Next comes the DOCTYPE statement that, like its namesake in SGML, provides the DTD for the document to be parsed.  In XML, a DTD may be in two parts: internal is contained in the document file itself while external is referenced by its URL or public identifier, with the internal part taking precedence over the external one in a case of conflict.

In our example, only the external part of DTD is present, which is referred to by the public identifier preceded by the keyword PUBLIC.  An XML parser is supposed to be able to retrieve the text of the DTD using its public identifier (that is, to translate the identifier into an URL or some other sort of physical address).  If the DTD you're using is not assigned a well-known public identifier, you should provide an URL instead of it, with the SYSTEM keyword instead of PUBLIC.  For instance:

<!DOCTYPE HTML SYSTEM
          "http://www.foo.com/html3x.dtd">

Finally, to provide an internal part for a DTD, you must put it in brackets within the DOCTYPE declaration.  Such a declaration may also contain a SYSTEM or PUBLIC external reference, for example:

<!DOCTYPE HTML SYSTEM
          "http://www.foo.com/html3x.dtd"
[
  <!-- your DTD goes here -->
]>
 
 
 

Element Declarations

 
 

The name right after the DOCTYPE keyword in the preceding statements is the name of the root element of your document type, the top level element that encloses all other elements.  In HTML, this element is named HTML, and in our Shakespearean example it is named PLAY.  Here's how the PLAY element is defined in play.dtd:

<!ELEMENT PLAY (title, fm, personae,
              scndescr, PLAYsubt, induct?,
              prologue?, act+, epilogue?)>

You can see that the content model for this element is quite simple and immediately translatable into human talk: "A PLAY is formed by its TITLE, followed by the front matter (FM), followed by the list of dramatis PERSONAE, and so on."  The question mark indicates optional elements, and the plus sign, the elements that may occur once or more.  Note that the XML spec prescribes to drop the SGML minimization parameters that are useless in XML, which doesn't permit tag omission anyway.

One more excerpt from PLAY.dtd shows a hierarchical set of related tags to mark a personage's speech:

 
 
 
<!ELEMENT speech   (speaker+,
                   (line | stagedir | subhead)+)>
<!ELEMENT speaker  (#PCDATA)>
<!ELEMENT line     (stagedir | #PCDATA)+>
<!ELEMENT stagedir (#PCDATA)>
<!ELEMENT subhead  (#PCDATA)>
 
 
 

Thus a SPEECH is constituted by one or more SPEAKER elements followed by at least one of the LINE, STAGEDIR (stage direction), or SUBHEAD elements, in no particular order (the "|" sign means that any one of connected particles may occur).  The #PCDATA keyword has the meaning of "any character data without tags"; thus, the SPEAKER, STAGEDIR, and SUBHEAD elements are allowed to contain only text characters while a LINE may have STAGEDIRs intermingled with text.

Note that nothing in the definition of LINE (except the name) suggests that what the element contains is really a line of verse.  It is just implied to be so by the person who did markup and it may be formatted as a line if an appropriate style sheet is used.  However, XML only serves as an intermediator between the author and the formatter, and is not intended to describe the nature of data elements that are marked up with it.

Here's a SPEECH element exemplifying these DTD provisions:

 
 
 
<SPEECH>
<SPEAKER>PROSPERO</SPEAKER>
<LINE><STAGEDIR>Aside</STAGEDIR> The Duke of Milan</LINE>
<LINE>And his more braver daughter could control thee,</LINE>
<LINE>If now 'twere fit to do't. At the first sight</LINE>
<LINE>They have changed eyes. Delicate Ariel,</LINE>
<LINE>I'll set thee free for this.</LINE>
<STAGEDIR>To FERDINAND</STAGEDIR>
<LINE>A word, good sir;</LINE>
<LINE>I fear you have done yourself some wrong: a word.</LINE>
</SPEECH>
 
 
 

Entity Declarations

 
 

Entities can be declared in a DTD as follows:

<!ENTITY me "Dmitry Kirsanov, 
                St.Petersburg, Russia">

In the document, such an entity can be used similarly to mnemonic character entities of HTML:

This document was created by &me; 
                             on Apr 21, 1997

Another syntax is used to define entities that refer to external files or documents.  For example:

<!ENTITY mypage SYSTEM
   "http://www.symbol.ru/dk/index.xml">
<!ENTITY xml-logo SYSTEM
   "http://www.ucc.ie/xml/xml.gif" NDATA gif>

In the second declaration, gif is the name of a notation (similar to a data type), which must be declared somewhere in the DTD along with information on where an XML processor can access a helper software capable of handling data in this notation.

Now, &mypage; and &xml-logo; entities can be used in documents using this DTD.  However, XML specification does not prescribe the exact behavior of XML application on encountering such an entity.  For example, it may incorporate it into the text of the current document or it may present it as a link that the user can activate.

 

Produced by Dmitry Kirsanov
Copyright Sams.net Publishing and


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Intel Article: Using Power & Display Context in the Intel Mobile Platform SDK
Internet.com eBook: Real Life Rails
IBM SCA Center Article: Simplifying Composite Applications with Service Component Architecture
Intel PDF: Quad-Core Impacts More Than the Data Center
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Intel PDF: Analysis of Early Testing of Intel vPro in Large IT Departments
Internet.com eBook: Best Practices for Developing a Web Site
Intel PDF: IT Agility through Automated, Policy-based Virtual Infrastructure
IBM CIO Whitepaper: The New Information Agenda. Do You Have One?
Microsoft Article: BitLocker Brings Encryption to Windows Server 2008
Microsoft Article: RODCs Transform Branch Office Security
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
Avaya Article: Advancing the State of the Art in Customer Service
IBM Whitepaper: How are other CIOs driving growth?
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Avaya Article: Avaya AE Services Provide Rapid Telephony Integration with Facebook
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Actuate Download: Free Visual Report Development Tool
Red Gate Download: SQL Backup Pro
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
IBM SCA Download: Start Building SCA Applications Today
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Anatomy of an Ajax Application · Popular JavaScript Framework Libraries: An Overview · Controllers: Programming Application Logic - Part 2
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
MS Access and MySQL · Cisco AutoQoS: VoIP QoS for Mere Mortals · While VoIP Adoption Explodes in Enterprise, Carrier Spending Lags

Created: Jun. 15, 1997
Revised: Jun. 16, 1997

URL: http://www.webreference.com/dlab/books/html/38-3.html