Converting DTDs to XML Schemas (1/3) - exploring XML | WebReference

Converting DTDs to XML Schemas (1/3) - exploring XML

Converting DTDs to XML Schemas

In the last installment of this column I described a neat tool, dtd2xsd, to convert Document Type Definitions to XML Schema. Let's put dtd2xsd to work with the RSS 0.91 DTD as an example.

RSS 0.91

Netscape invented RSS in order to advertise news channels for their Web service My Netscape Network, one of the first personalized portals. Any site could write up a summary file in RSS, submit the URL to My Netscape Network and enable users to construct their personal start page on that service from any of these registered news sources. Unfortunately this service silently disappeared from Netscape's site, leaving behind a slightly confused RSS community.

The DTD can be found at the My Netscape site.

<!ELEMENT rss (channel)>
<!ATTLIST rss
          version     CDATA #REQUIRED> <!-- must be "0.91"> -->
<!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT link (#PCDATA)>
<!ELEMENT image (title | url | link | width? | height? | description?)*>
<!ELEMENT url (#PCDATA)>
<!ELEMENT item (title | link | description)*>
<!ELEMENT textinput (title | description | name | link)*>
<!ELEMENT name (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
<!ELEMENT language (#PCDATA)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT pubDate (#PCDATA)>
<!ELEMENT lastBuildDate (#PCDATA)>
<!ELEMENT docs (#PCDATA)>
<!ELEMENT managingEditor (#PCDATA)>
<!ELEMENT webMaster (#PCDATA)>
<!ELEMENT hour (#PCDATA)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT skipHours (hour+)>
<!ELEMENT skipDays (day+)>

In a nutshell an RSS 0.91 channel definition consists of a title, a description, a Web link, a language description and a number of news items. Elements like PICS rating, an image, author and date information can optionally be added. I omitted the long list of entity definitions for HTML characters, such as &nbsp; for a non-breaking space.

The Conversion Tool

The DTD to XML Schema Conversion Tool takes a DTD and translates it into its equivalent XML schema definition. Usage of the tool is as follows:

perl dtd2xsd.pl [-alias] [-prefix p] [-ns n] [file]
  -alias
    enables special aliases (default off)
  -prefix t
    specify namespace prefix
  -ns http://www.w3.org/namespace/
    specify namespace URI
  -simpletype pattern base
    treat parameter entities whose name match this pattern
    as simple datatypes derived from this base type
  -attrgroup pattern
    treat parameter entities whose name match this pattern
    as attribute groups
  -modelgroup pattern
    treat parameter entities whose name match this pattern
    as model groups

So let's call the tool like this:

perl dtd2xsd.pl -prefix rss -ns http://purl.org/rss/0.91 rss091.dtd

Let's look at the result of the conversion.

http://www.internet.com

Produced by Michael Claßen

Created: Jul 18, 2001
Revised: Jul 18, 2001