XSLT 2.0 Web Development: Elements of a Web Site. Pt. 2 | 4

XSLT 2.0 Web Development: Elements of a Web Site. Pt. 2.

3.9. Master document

In the previous chapter (2.1.2.1), we found that any web site consisting of more than one page must have a master document providing shared content and a site directory. In this section, we’ll look at some practical examples of constructs in a typical web site’s master document.

You may find the sample master document described here (see Example 3.2 for a complete listing) somewhat eclectic. This eclecticism, however, stems from the real-world practice of XML web sites. In fact, the master document is more of a database than a document (1.2). The layout of components in this database is rarely important, as they are not processed sequentially but accessed in arbitrary order. For lots of ideas on how to access and use the master document content from the stylesheet, see Chapter 5.

A master document represents a new document type, with its root element type different from that of a page document, and most other element types usable only in a master document. However, if you don’t use DTDs (2.2.4) or XSDL, this distinction has little practical value, and you can use one schema to validate all of your XML (both page documents and the master document). Such a schema written in Schematron is shown in Example 3.3 (see also 5.1.3 for advanced Schematron checks).

3.9.1. Site structure

The role of the master document is that of a hub that all other documents refer to when they need to figure out a wider context of the web site or establish mutual links. Whenever the stylesheet needs some information that is not supplied by the currently processed document, it will consult the master document to find either that information or a link to it.

Therefore, the most important part of a master document is the site directory — a collection of information about all pages of the site and their organization. This directory is used for building the site’s navigation as well as for resolving abbreviated internal links (3.5.3).

Besides pages, other components of the site may also be mentioned in the master document, such as all Flash animations you have or all images of a specific kind used on the site. Units of orthogonal content must be listed in the master document as well (3.9.1.3) so that pages can reference and incorporate them. Finally, sources of dynamic content must be registered for the stylesheet to know what to insert into static page templates (3.9.1.4).

3.9.1.1. Menu structure

A flat list of all pages is not sufficient for building a usable site. We also need to represent the structure of the site’s menu and the correspondence between menu items and pages.

A simple site’s menu may be little more than a linear list of links to each of its pages. However, most sites require more complex menu structures. Common are hierarchical menus where some of the top-level items encompass multiple subpages and/or nested submenus. Such a structure is straightforward to express in XML.

Some sites may have more than one menu. For example, there may be a menu of topics (content sections) and another independent menu of tools (pages that help navigate the site, such as search and site map). Such orthogonal menu hierarchies can be stored in independent XML subtrees within the master document.

3.9.1.2. Menu items and pages

What do we need to store in the master document for each menu item? To build a clickable menu element, we must know at least its label (the visible text displayed in the menu) and the page that it is linked to. A label may contain inline markup and should therefore be stored in a child element. As for the link, it is natural to use the general linking attributes with abbreviated addresses that we’ve developed for in-flow links on site pages (3.5.1).

Items vs. pages. A menu item is not the same as a page of the site. Some pages may not be available through the menu, while others may be linked from more than one menu item. Therefore, the page itself must be represented by a separate element that the menu item element will link to.

However, that does not mean that these page elements must be stored in a different part of the master document. You can still categorize all your pages under the branches of the menu tree: Even if a page is not linked from the menu, usually you can find a branch where it logically belongs (unless it is orthogonal content, 3.9.1.3). The stylesheet will thus be able to read the menu structure both hierarchically (when looking for menu items) and sequentially (when looking for pages).

Here’s a possible representation of a menu item:

<item link="products">  
  <label>Products</label>
  <page id="products" title="Our products" 
        src="products/"/>
  <page id="software" title="Our software" 
        src="products/software/"/>
  <page id="hardware" title="Our hardware" 
        src="products/hardware"/>
</item>

In addition to a label and one or more pages, an item may also contain other item children. A complete menu description would thus consist of a hierarchy of items under one parent, e.g. menu. Note that in each page element, the id attribute provides a unique identifier of not only that element, but of the page itself. It is these identifiers that are used as abbreviated addresses (3.5.3) in internal links.

How unabbreviation works. When resolving a link, the stylesheet translates the page identifier into the location of that page taken from the src attribute. However, that attribute’s value is also somewhat “abbreviated” in that it omits irrelevant technical information such as the filename extension and the default filename (usually index.html) in a directory. These omitted parts are easy to restore by applying simple rules, so the three page elements in the above example would yield these page locations:

/products/index.html
/products/software/index.html
/products/hardware.html

Note that a location ending with a “/” is considered a directory and has index.html appended; other locations only receive the “.html” extension.

Accessing the source. There is one more reason to store page pathnames without extensions. When locations are resolved for the purpose of accessing the source XML documents rather than creating an HTML link, the same src values are transformed into *.xml file locations (assuming the directory structure of the site source is similar to that of the transformed site, 3.9.3). For stylesheet code examples to access this menu structure, see Chapter 5 (5.1.1, 5.7).

Storing page metadata. Sometimes, a more complex layout for the page elements may be necessary. For example, if your bilingual site provides two language versions of each page, a page element could hold both metadata that is common to all language versions of the page (e.g., the page’s identifier and source location) and language-specific metadata (e.g., title):

<page id="software" src="products/software/">
  <translation lang="en">Our software</translation>
  <translation lang="fr">Nos logiciels</translation>
</page>

Some of the metadata (3.1.1) may also be moved from page documents into the master document for convenient access. For example, if you want to control which pages of the site are to be seen by search engine spiders and which are hidden from them, you could add a corresponding value to each page’s source document. However, since this information will be pulled from all pages of the site simultaneously, it is more convenient to add a spider control attribute to the page element in the master document. This way, the stylesheet will be able to produce a site-wide robots.txt file for external spiders and/or a configuration update for a local search engine spider without accessing all page documents.

3.9.1.3. Orthogonal content

Along with all pages, a master document should also list all the units of orthogonal content that your site will use (2.1.2.2). However, unlike pages, orthogonal content references cannot be categorized under the menu hierarchy (that is why this content is orthogonal, after all). You’ll need to create a separate construct to associate orthogonal content identifiers with corresponding (abbreviated) source locations — for example,

<blocks>
  <block id="news" src="news/latest"/>
  <block id="subscribe" src="scripts/subscribe"/>
  <block id="donate" src="scripts/donate"/>
</blocks>

Now if the stylesheet processing a page document encounters a block that has no content of its own but references some orthogonal content unit — for example, by specifying idref="news" — the document at news/latest.xml will be retrieved and inserted into the current document, formatted as appropriate for an orthogonal content block.

It is important that the id and src attributes of a master document’s block element have the same names and semantics as the attributes of page elements (3.9.1.2). We will use this when writing stylesheet code to unabbreviate links or search through all pages of the site (Chapter 5), since every page must be registered as either a page in the menu or a source of an orthogonal block (or both).

Extracting orthogonal content. In the last example, each orthogonal block was stored in its own file — but this is not always the best approach. You may want to reuse parts of regular pages as orthogonal content.

For instance, the news page of a site is often a list of news items in reverse chronological order. You may want to automatically extract the most recent news item and display it in an orthogonal content block on other pages of the site. Another example is a “featured product” blurb extracted from that product’s own page and reused on the front page of the site.

For these situations, what we need is a way to specify what part of the original page document is to be reused as orthogonal content on other pages. Since this part will most likely also be a block, we only need to indicate the id of the block we are interested in. Thus, if the most recent news block on the news page always has id="last", we could write in the master document:

<block id="last-news" src="news/" select="last"/>

Now any page can place a copy of the latest news item by referencing the corresponding orthogonal block by its identifier, last-news. For example, your page document might contain

<block idref="last-news"/>

Likewise, the featured product blurb could be extracted from the block with id="blurb" on that product’s page:

<block id="feature" src="products/foobar" select="blurb"/>

Here, the featured product is identified by the path to the corresponding document (products/foobar.xml). When you want to feature a different product, all you need to do is change this value so it points to another product’s page (assuming each product page has exactly one block with id="blurb"; see also 5.1.3.7). After that, all pages that use

<block idref="feature"/>

will (after you rerun the transformation) display the blurb for the new product.

Logically, without the select attribute, a master document’s block will reference the entire content of the document pointed to by the src attribute. Your Schematron schema could also check that the referenced elements actually exist in the referenced documents (see 5.3.3.1 for how to code this).

No perfection in this world. It would be even more natural to use XPath expressions for extracting orthogonal blocks. Then we could use not only the id attribute value but any XPath test for identifying the block we need. For instance, for the first block on the page, we would write

<block id="news" src="news/" xpath="//block[1]"/>

Selecting the last block that has a section inside would be as simple as

<block id="lastsection" src="dir/page" 
       xpath="//block[section][last()]"/>

There’s only one problem with this kind of selector: In XSLT, you can’t take a string and treat it as an XPath expression — and what the master document (or any other document) stores in its attributes is always just strings from the XSLT processor viewpoint.

Saxon offers the saxon:evaluate() extension function (4.4.2.1) that might save the idea, but its implementation is quite limited, not to mention nonportable to other XSLT processors. Much better is the dyn:evaluate() function16 from EXSLT (4.4.1) which is currently supported by several processors but not by Saxon.

Created: March 27, 2003
Revised: May 24, 2004

URL: http://webreference.com/programming/xsltweb2/1