Generating Web content with Cocoon (1/2) - exploring XML
Generating Web content with Cocoon
The Apache project is well-known for the Web server software it produces that is carrying its name. In the past, many other interesting software projects were also started there, mainly in the Java and XML space. Cocoon is one of them.
Cocoon is a Java Web-application for generating dynamic content using XML. It can be installed on any Java Servlet Engine and comes with a wide variety of components for generating, transforming and outputting data with XML. Cocoon 2 was recently released as a complete rewrite of its predecessor, with improved flexibility and scalability.
The central concept in Cocoon is the pipeline, a number of components plugged together in a serial configuration to process incoming data that will be passed along. Unix users are well-aware of this concept, as it comes with a lot of small utilities that can be linked with the famous pipe symbol "|" to manipulate character data:
ls -1 | grep "\.bak" | wc -l
The ls program understands file systems, grep finds characters in arbitrary text, and wc counts line of text. Together this little series of programs counts the number of backup files in a certain directory. The glue between these tools is character data that the operating system transparently passes between them.
The Cocoon developers set out to create a similar system for generating content on the Web by piping XML through a configurable set of tools. The first version of the software was passing around full DOM documents, limiting scalability with regard to the size of documents that could be processed, and the amount of parallelism in the pipeline. Furthermore, the pipeline was defined through processing instructions within the documents, making reuse in different contexts difficult.
Version 2 eliminates these problems by using SAX instead of DOM, and connecting the processing components through SAX events. This way XML documents of arbitrary size can be processed, and the components can work in parallel on the same document. The configuration of the pipeline is now moved out of the data documents and into a separate sitemap file.
Now out of which components can a pipeline be built? Cocoon comes with many configurable components for generating, transforming and serializing data with XML. Some generators are:
- FileReader: Reads a file from disk.
- DirectoryGenerator: Returns the filenames to generate directory listings.
- DatabaseGenerator: Accesses relational databases.
A generator creates a series of SAX events that can be processed in subsequent stages of the pipeline. Readers are a special case of generators, in that they return non-XML data and are usually used as a one-stage pipeline, like the FileReader for returning static data to the Web client.
Next are transformers and serializers.
Produced by Michael Claßen
Created: Mar 18, 2002
Revised: Mar 18, 2002