dtddoc step 3: Element and attribute descriptions (2/3) - exploring XML | WebReference

dtddoc step 3: Element and attribute descriptions (2/3) - exploring XML

dtddoc step 3: Element and attribute descriptions

Adding dtddoc comments to DTDs

In order to generate a set of HTML pages for documentation we need the following information to be passed on via comments in the DTD:

  1. One text block for describing the DTD as a whole
  2. A one-line short description for every element in the DTD
  3. One or more paragraphs for the long description for every element in the DTD
  4. A one-line description for every attribute of an element

Java comments are enclosed by /* ... */, and javadoc reads comments that start with an extra asterisk /** ... */. We take a similar approach by defining dtddoc comments to begin with an XML tag followed by its description:

<!-- <element> One-line description of the element.
               [Long description paragraph(s)] -->
<!-- <element attributename[="..."]> One-line description of the attribute. -->
<!-- <> One-line description for the whole DTD.
               [Long description paragraph(s)] -->

We start our implementation of dtddoc by annotating a DTD with the above comments, and using the various packages to programmatically extract them again. As an example of what an actual, annoted DTD might look like, take a look at this annotated sample, which we created from Netscape's original RSS 0.91 DTD and their original specification.

In Java we have to traverse the DTD data structure and parse the text out of the DTDComment objects. Parsing text in Java is awkward (at least before version 1.4, when regular expression matching was added). Assuming not everyone has the latest version of Java, we use a third-party package from the Apache Jakarta project:

import java.util.*;
import java.io.*;
import com.wutka.dtd.*;
import org.apache.oro.text.regex.*;
public class DTDDoc {
    public static void main(String[] args) throws IOException {
		DTDParser parser = new DTDParser(new FileReader(args[0]));
		DTD dtd = parser.parse();
		Enumeration e = dtd.getItemsByType(DTDComment.class).elements();
		while (e.hasMoreElements())
    private static void parseComment(String comment) {
		PatternCompiler pc = new Perl5Compiler();
		PatternMatcher pm= new Perl5Matcher();
		Pattern pattern;
		try {
			pattern = pc.compile("<(\\w*)(\\s+(\\w+)=\"[^\"]*\")?>\\s*(.*)\n([\\w\\W]*)",
		catch(MalformedPatternException e) {
			System.err.println("Bad pattern: " + e.getMessage());
		PatternMatcherInput input = new PatternMatcherInput(comment);
		if (pm.contains(input, pattern)) {
			MatchResult result = pm.getMatch();  
			// Print result

The above code fragment parses a DTD and feeds just the comments to the parseComment() function. All dtddoc comments match the given pattern and are conveniently presented in the result object.

Let's see how PHP and Perl handle this.

Produced by Michael Claßen

URL: http://www.webreference.com/xml/column67/2.html
Created: Oct 28, 2002
Revised: Oct 28, 2002