The Place of XSLT in the XML Family
The Place of XSLT in the XML Family
XSLT is published by the World Wide Web Consortium (W3C) and fits into the XML family of standards, most of which are also developed by W3C. In this section I will try to explain the sometimes-confusing relationship of XSLT to other related standards and specifications.
XSLT and XSL
XSLT started life as part of a bigger language called XSL (eXtensible Stylesheet Language). As the name implies, XSL was (and is) intended to define the formatting and presentation of XML documents for display on screen, on paper, or in the spoken word. As the development of XSL proceeded, it became clear that this was usually a two-stage process: first a structural transformation, in which elements are selected, grouped and reordered, and then a formatting process in which the resulting elements are rendered as ink on paper, or pixels on the screen. It was recognized that these two stages were quite independent, so XSL was split into two parts, XSLT for defining transformations, and "the rest" Â– which is still officially called XSL, though some people prefer to call it XSL-FO (XSL Formatting Objects) Â– the formatting stage.
XSL Formatting is nothing moreÂ than another XML vocabulary, in which the objects described are areas of the printed page and their properties. Since this is just another XML vocabulary, XSLT needs no special capabilities to generate this as its output. XSL Formatting is outside the scope of this book. It's a big subject (the draft specification currently available is far longer than XSLT), the standard is not yet stable, and the only products that implement it are at a very early stage of development. What's more, you're far less likely to need it than to need XSLT. XSL Formatting provides wonderful facilities to achieve high-quality typographical output of your documents. However, for most people translating them into HTML for presentation by a standard browser is quite good enough, and that can be achieved using XSLT alone, or if necessary, by using XSLT in conjunction with Cascading Style Sheets (CSS or CSS2), which I shall return to shortly.
The XSL Formatting specifications, which at the time of writing are still evolving, can be found at http://www.w3.org/TR/xsl.
Halfway through the development of XSLT, it was recognized that there was a significant overlap between the expression syntax in XSLT for selecting parts of a document, and the XPointer language being developed for linking from one document to another. To avoid having two separate but overlapping expression languages, the two committees decided to join forces and define a single language, XPath, which would serve both purposes. XPath version 1.0 was published on the same day as XSLT, 16 November 1999.
XPath acts as a sublanguage within an XSLT stylesheet. An XPath expression may be used for numerical calculations or string manipulations, or for testing Boolean conditions, but its most characteristic use (and the one that gives it its name) is to identify parts of the input document to be processed. For example, the following instruction outputs the average price of all the books in the input document:
<xsl:value-of select="sum(//book/@price) div count(//book)"/>
Here the <xsl:value-of> element is an instruction defined in the XSLT standard, which causes a value to be written to the output document. The select attribute contains an XPath expression, which calculates the value to be written: specifically, the total of the price attributes on all the <book> elements, divided by the number of <book> elements.
The separation of XPath from XSLT works reasonably well, but there are places where the split seems awkward, and there are many cases where it's difficult to know which document to read to findÂ the answer to a particular question. For example, an XPath expression can contain a reference to a variable, but creating the variable and giving it an initial value is the job of XSLT. Another example: XPath expressions can call functions, and there is a range of standard functions defined. Those whose effect is completely freestanding, such as string-length(), are defined in the XPath specification, whereas additional functions whose behavior relies on XSLT definitions, such as key(), are defined in the XSLT specification.
Because the split is awkward, I've written this book as if XSLT+XPath were a single language. For example, all the standard functions are described together in Chapter 7. In the reference sections, I've tried to indicate where each function or other construct is defined in the original standards, but the working assumption is that you are using both languages together and you don't need to know where one stops and the other one takes over. The only downside of this approach is that if you want to use XPath on its own, for example to define document hyperlinks, then the book isn't really structured to help you.
XSLT and Internet Explorer 5
Very soon after the first draft proposals for XSL were published, back in 1998, Microsoft shipped a partial implementation as a technology preview for use with IE4. This was subsequently replaced with a rather different implementationÂ when IE5 came out. This second implementation, known as MSXSL, remained in the field essentially unchanged until very recently, and is what many people mean when they refer to XSL. Unfortunately, though, Microsoft jumped the gun, and the XSLT standard changed and grew, so that when the XSLT Recommendation version 1.0 was finally published on 16 November 1999, it bore very little resemblanceÂ to the initial Microsoft product.
A Recommendation is the most definitive of documents produced by the W3C. It's not technically a standard, because standards can only be published by government-approved standards organizations. But I will often refer to it loosely as "the standard" in this book.
Many of the differences, such as changes of keywords, are very superficial but some run much deeper: for example, changes in the way the equals operator is defined.
So the Microsoft IE5 dialect of XSL is also outside the scope of this book. Please don't assume that anything in this book is relevant to the original Microsoft XSL: even where the syntax appears similar to XSLT, the meaning of the construct may be completely different.
You can find information about the original IE5 dialect of XSL in the Wrox book XML IE5 Programmer's Reference, ISBN 1-861001-57-6.
Microsoft has fully backed the development of the new XSLT standard, and on 26 January 2000 they released their first attempt at implementing it. It's a partial implementation, packaged as part of a set of XML tools called MSXML, but enough to run quite a few of the examples in this book Â– and the parts they have implemented conform quite closely to the XSLT specifications. A further update to this product (MSXML3) was released on 15 March 2000, bringing the language even closer to the standard. They've announced that they intend to move quickly towards a full implementation, so by the time you read this, the Microsoft product may comply fully with the W3C standard: check their web site for the latest details.
Microsoft has also released a converter to upgrade stylesheets from the old XSL dialect to the new. However, this isn't the end of the story, because, of course, there are millions of copies of IE5 installed that only support the old version. If you want to develop a web site that delivers XML to the browser and relies on the browser interpreting its XSLT stylesheet, you've currently got your work cut out to make sure all your users can handle it.
If you are using Microsoft technology on theÂ server, there is an ISAPI extension called XSLISAPI that allows you to do the transformation in the browser where it's supported, and on the server otherwise. Until the browser situation stabilises, however, server-side transformationÂ of XML to HTML, driven from ASP pages or from Java servlets, is really the only practical option for a serious project.
There's more information about products from Microsoft and other vendors in Chapter 10 Â– but do be aware that it will become out of date very rapidly.
XSLT and XML
XSLT is essentially a tool for transforming XML documents. At the start of this chapter we discussed the reasons why this is important, but now we need to look a little more precisely at the relationship between the two. There are two particular aspects of XML that XSLT interacts with very closely: one is XML Namespaces; the other is the XML Information Set. These are discussed in the following sections.
XSLT is designed on the basis that XML namespaces are an essential part of the XML standard. So when the XSLT standard refers to an XML document, it really meansÂ an XML document that also conforms to the XML Namespaces specification, which can be found at http://www.w3.org/TR/REC-xml-names.
For a full explanation of XML Namespaces, see Chapter 7 of the Wrox Press book Professional XML, ISBN 1-861003-11-0.
Namespaces play an important role in XSLT. Their purpose is to allow you to mix tags from two different vocabularies in the same XML document. For example, in one vocabulary <table> might mean a two-dimensional array of data values, while in another vocabulary <table> refers to a piece of furniture. Here's a quick reminder of how they work:
q Namespaces are identified by a Unique Resource Identifier (URI).Â This can take a number of forms. One form is the familiar URL, for example http://www.wrox.com/namespace.Â Another form, not fully standardized but being used in some XML vocabularies (see for example http://www.biztalk.org) is a URN, for example urn:java:com.icl.saxon. The detailed form of the URI doesn't matter, but it is a good idea to choose one that will be unique. One good way of achieving this is to use the URL of your own web site. But don't let this confuse you into thinking that there must be something on the web site for the URL to point to. The namespace URI is simply a string that you have chosen to be different from other people's namespace URIs: it doesn't need to point to anything.
q Since namespace URIs are often ratherÂ long and use special characters such as Â«/Â», they are not used in full as part of the element and attribute names. Instead, each namespace used in a document can be given a short nickname, and this nickname is used as a prefix of the element and attribute names. It doesn't matter what prefix you choose, because the real name of the element or attribute is determined only by its namespace URI and its local name (the part of the name after the prefix). For example, all my examples use the prefix xsl to refer to the namespace URI http://www.w3.org/1999/XSL/Transform, but you could equally well use the prefix xslt, so long as you use it consistently.
q For element names, you can also declare a default namespace URI, which is to be associated with unprefixed element names. The default namespace URI, however, does not apply to unprefixed attribute names.
A namespace prefix is declared using a special pseudo-attribute within any element tag, with the form:
xmlns:prefix = "namespace-URI"
This declares a namespace prefix, which can be used for the name of that element, for its attributes, and for any element or attribute name contained in that element. The default namespace, which is used for elements having no prefix (but not for attributes), is similarly declared using a pseudo-attribute:
xmlns = "namespace-URI"
XSLT can't be used to process an XML document unless it conforms to the XML Namespaces recommendation. In practice this isn't a big problem, because most people are treating XML Namespaces as if it were an inherent part of the XML standard, rather than a bolt-on optional extra. It does have certain implications, though. In particular, serious use of Namespaces is virtually incompatible with serious use of Document Type Definitions, because DTDs don't recognizeÂ the special significance of prefixes in element names; so a consequence of backing Namespaces is that XSLT provides very little support for DTDs, choosing instead to wait until the replacement facility, XML Schemas, eventually emerges.
The XML Information Set
XSLT is designed to work on the informationÂ carried by an XML document, not on the raw document itself. This means that, as an XSLT programmer, you are given a tree view of the source document in which some aspects are visible and others are not. For example, you can see the attribute names and values, but you can't see whether the attribute was written in single or double quotes, you can't see what order the attributes were in, and you can't tell whether or not they were written on the same line.
One messy detail is that there have been many attempts to define exactly what constitutes the essential information content of a well-formed XML document, as distinct from its accidental punctuation. All attempts so far have come up with slightly different answers. The most recent, and the most definitive, attempt to provide a common vocabulary for the content of XML documents is the XML Information Set definition, which may be found at http://www.w3.org/TR/xml-infoset.
Unfortunately this came too late to make all the standards consistent. For example, some treat comments as significant, others not; some treat the choice of namespace prefixes as significant, others take them as irrelevant. I shall describe in Chapter 2 exactly how XSLT (or more accurately, XPath) defines the Tree Model of XML, and how it differs in finer points of detail from some of the other definitions such as the Document Object Model or DOM.
XSL and CSS
Why are there two stylesheet languages, XSL (i.e. XSLT plus XSL Formatting Objects) as well asÂ Cascading Style Sheets (CSS and CSS2)?
It's only fair to say that in an ideal world there would be a single language in this role, and that the reason there are two is that no-one has been able to invent something that achieved the simplicity and economy of CSS for doing simple things, combined with the power of XSL for doing more complex things.
CSS (by which I include CSS2, which greatly extends the degree to which you can control the final appearance of the page) is mainly used for rendering HTML, but it can also be used for rendering XML directly, by defining the display characteristics of each XML element. However, it has serious limitations. It cannot reorder the elements in the source document, it cannot add text or images, it cannot decide which elements should be displayed and which omitted, it cannot calculate totals or averages or sequence numbers. In other words, it can only be used when the structure of the source document is already very close to the final display form.
Having said this, CSS is simple to write, and it is very economical in machine resources. It doesn't reorder the document, so it doesn't need to build a tree representation of the document in memory, and it can start displaying the document as soon as the first text is received over the network. Perhaps most important of all, CSS is very simple for HTML authors to write, without any programming skills. In comparison, XSLT is far more powerful, but it also consumes a lot more memory and processor power, as well as training budget.
It's often appropriate to use both tools together. Use XSLT to create a representation of the documentÂ that is close to its final form, in that it contains the right text in the right order, and then use CSS to add the finishing touches, by selecting font sizes, colors, and so on. Typically you would do the XSLT processing on the server, and the CSS processing on the client (in the browser), so another advantage of this approach is that you reduce the amount of data sent down the line, which should improve response time for your users as well as postponing the next expensive bandwidth increase.
Created: Jan. 05, 2001
Revised: Jan. 05, 2001