XSLT as a Language
XSLT as a Language
What are the most significant characteristics of XSLT asÂ a language, which distinguish it from other languages? In this section I shall pick three of the most striking features: the fact that it is written in XML syntax, the fact that it is a language free of side-effects, and the fact that processing is described as a set of independent pattern-matching rules.
Use of XML Syntax
As we've seen, the use of SGML syntax for stylesheets was proposed as long ago as 1994, and it seems that this idea gradually became the accepted wisdom. It's difficult to trace exactly what the overriding arguments were, and when you find yourself writing something like:
Â Â <xsl:call-template name="f">
Â Â Â Â Â <xsl:with-param name="x"/>
Â Â </xsl:call-template>
to express what in other languages would be written as Â« y = f(x); Â», then you may find yourself wondering how such a decision came to be made.
In fact, it could have been worse: in the very early drafts, the syntax for writing what are now XPath expressions was also expressed in XML, so instead of writing select="book/author/first-name" you had to write something along the lines of:
Â Â <path>
Â Â Â Â Â <element type="book">
Â Â Â Â Â <element type="author">
Â Â Â Â Â <element type="first-name">
Â Â </path>
The most obvious arguments for expressing XSLT stylesheets in XML are perhaps:
q There is already an XML parserÂ in the browser, so it keeps the footprint small if this can be re-used.
q Everyone had got fed up with the syntactic inconsistencies between HTML/XML and CSS, and didn't want the same thing to happen again.
q The syntax of DSSSL was widely seen as a barrier to its adoption; better to have a syntax that was already familiar in the target community.
q Many existing popular templating languages are expressed as an outline of the output document with embedded instructions, so this is a familiar concept.
q All the lexical apparatus is reusable, for example Unicode support, character and entity references, whitespace handling, namespaces.
q It's occasionally useful to have a stylesheet as the input or output of a transformation (witness the Microsoft XSL converter as an example), so it's a benefit if a stylesheet can read and write other stylesheets.
q Providing visual development tools easily solves the inconvenience of having to type lots of angle brackets.
Like it or not, the XML-based syntax is now an intrinsic feature of the language that has both benefits and drawbacks. It does require a lot of typing: but in the end, the number of keystrokes has very little bearing on the ease or difficulty of solving particular transformation problems.
The idea that XSL should be a declarative language free of side-effects appears repeatedly in the early statements about the goals and design principles of the language, but no-one ever seems to explain why: what would be the user benefit?
A function or procedure in a programming language is said to have side-effects if it makes changes to its environment, for example if it can update a global variable that another function or procedure can read, it can write messages to a log file, or prompt the user. If functions have side-effects, it becomes important to call them the right number of times and in the correct order. Functions that have no side-effects (sometimes called pure functions) can be called any number of times and in any order. It doesn't matter how many times you evaluate the area of a triangle, you will always get the same answer; but if the function to calculate the area has a side-effect such as changing the size of the triangle, or if you don't know whether it has side-effects or not, then it becomes important to call it once only.
It is possible to find hints at the reason why this was considered desirable in the statements that the language should be equally suitable for batch or interactive use, and that it should be capable of progressive rendering. There is a concern that when you download a large XML document, you won't be able to see anything on your screen until the last byte has been received from the server. Equally, if a small change were made to the XML document, it would be nice to be able to determine the change needed to the screen display, without recalculating the whole thing from scratch. If a language has side effects then the order of execution of the statements in the language has to be defined, or the final result becomes unpredictable. Without side-effects, the statements can be executed in any order, which means it is possible, in principle, to process the parts of a stylesheet selectively and independently.
Whether XSLT has actually achieved these goals is somewhat debatable. Certainly, determining which parts of the output document are affected by a small change to one part of the input document is not easy, given the flexibility of the expressions and patterns that are now permitted in the language. Equally, all existing XSLT processors require the whole document to be loaded into memory. However, it would be a mistake to expect too much too soon. When E. F. Codd published the relational calculus in 1970, he made the claim that a declarative language was desirable because it was possible to optimize it, which was not possible with the navigational data access languages in use at the time. In fact it took another fifteen years before relational optimization techniques (and, to be fair, the price of hardware) reached the point where large relational databases were commercially viable. But in the end he was proved right, and the hope is that the same principle will also eventually deliver similar benefits in the area of transformation and styling languages.
What being side-effect free means in practice is that you cannot update the value of a variable. This restriction is something you may find very frustrating at first, and a big price to pay for these rather remote benefits. But as you get the feel of the language and learn to think about using it the way it was designed to be used, rather than the way you are familiar with from other languages, you will find you stop thinking about this as a restriction. In fact, one of the benefits is that it eliminates a whole class of bugs from your code! I shall come back to this subject in Chapter 8, where I outline some of the common design patterns for XSLT stylesheets, and in particular, describe how to use recursive code to handle situations where in the past you would probably have used updateable variables to keep track of the current state.
The dominant feature of a typical XSLT stylesheet is that it consists of a sequence of template rules, each of which describes how a particular element type or other construct should be processed. The rules are not arranged in any particular order; they don't have to match the order of the input or the order of the output, and in fact there are very few clues as to what ordering or nesting of elements the stylesheet author expects to encounter in the source document. It is this that makes XSLT a declarative language: you say what output should be produced when particular patterns occur in the input, as distinct from a procedural program where you have to say what tasks to perform in what order.
This rule-based structure is very like CSS, but with the major difference that both the patterns (the description of which nodes a rule applies to) and the actions (the description of what happens when the rule is matched) are much richer in functionality.
Example: Displaying a Poem
Let's see how we can use the rule-based approach to format a poem. Again, we haven't introduced all the concepts yet, so I won't try to explain every detail of how this works, but it's useful to see what the template rules actually look like in practice.
Let's take this XML source as our poem. The source file can be found on the web site for this book at http://www.wrox.com, under the name poem.xml, and the stylesheet is there as poem.xsl.
Â Â <author>Rupert Brooke</author>
Â Â <date>1912</date>
Â Â <title>Song</title>
Â Â Â Â Â <line>And suddenly the wind comes soft,</line>
Â Â Â Â Â <line>And Spring is here again;</line>
Â Â Â Â Â <line>And the hawthorn quickens with buds of green</line>
Â Â Â Â Â <line>And my heart with buds of pain.</line>
Â Â </stanza>
Â Â <stanza>
Â Â Â Â Â <line>My heart all Winter lay so numb,</line>
Â Â Â Â Â <line>The earth so dead and frore,</line>
Â Â Â Â Â <line>That I never thought the Spring would come again</line>
Â Â Â Â Â <line>Or my heart wake any more.</line>
Â Â </stanza>
Â Â <stanza>
Â Â Â Â Â <line>But Winter's broken and earth has woken,</line>
Â Â Â Â Â <line>And the small birds cry again;</line>
Â Â Â Â Â <line>And the hawthorn hedge puts forth its buds,</line>
Â Â Â Â Â <line>And my heart puts forth its pain.</line>
Â Â </stanza>
We'll write a stylesheetÂ such that this document appears in the browser as shown below:
It starts with theÂ standard header:
Â Â xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
Â Â version="1.0">
Now we'll write one template rule for each element type in the source document. The rule for the <poem> element creates the skeleton of the HTML output, defining the ordering of the elements in the output (which doesn't have to be the same as the input order). The <xsl:value-of> instruction inserts the value of the selected element at this point in the output. The <xsl:apply-templates>instructions cause the selected child elements to be processed, eachÂ using its own template rule.
Â Â <html>
Â Â <head>
Â Â <title><xsl:value-of select="title"/></title>
Â Â </head>
Â Â Â Â Â <xsl:apply-templates select="author"/>
Â Â Â Â Â <xsl:apply-templates select="stanza"/>
Â Â Â Â Â <xsl:apply-templates select="date"/>
Â Â </body>
Â Â </html>
The template rules for the <title>, <author>, and <date> elements are very simple: they take the content of the element (denoted by Â«select="."Â»), and surround it within appropriate HTML tags to define its display style:
Â Â <div><h1><xsl:value-of select="."/></h1></div>
Â Â <div><h2>By <xsl:value-of select="."/></h2></div>
Â Â <p><i><xsl:value-of select="."/></i></p>
The template rule for the <stanza> element puts each stanza into an HTML paragraph, and then invokes processing of the lines within the stanza, as defined by the template rule for lines:
Â <p><xsl:apply-templates select="line"/></p>
The rule for <line> elements is a little more complex: if the position of the line within the stanza is an even number, it precedes the line with two non-breaking-space characters ( ). The <xsl:if> instruction tests a boolean condition, which in this case calls the position() function to determine the relative position of the current line. It then outputs the contents of the line, followed by an empty HTML <br> element to end the line.
Â Â <xsl:if test="position() mod 2 = 0">  </xsl:if>
Â Â <xsl:value-of select="."/><br/>
And to finish off, we close the <xsl:stylesheet> element:
Although template rules are a characteristic feature of the XSLT language, we'll see that this is not the only way of writing a stylesheet. In Chapter 8, I will describe four different design patterns for XSLT stylesheets, only one of which makes extensive use of template rules. In fact, the Hello World stylesheet I presented earlier in this chapter doesn't make any real use of template rules: it fits into the design pattern I call fill-in-the-blanks, because the stylesheet essentially contains the fixed part of the output with embedded instructions saying where to get the data to put in the variable parts.
Created: Jan. 05, 2001
Revised: Jan. 05, 2001