WebReference.com - Part 3 of Chapter 1: Professional XML Schemas, from Wrox Press Ltd (3/5)
Professional XML Schemas
How the XML Schema Recommendation Specifies Validity
The XML Schema Recommendation does not indicate how an XML Schema aware processor should validate a document, so before we look at validation it is worthwhile taking a moment to understand how the XML Schema Recommendation determines validity. The XML Schema Recommendation is written in terms of an abstract model (rather like the DOM Recommendation). This corresponds to information items as defined in the XML Information Set.
The purpose of the XML Information Set (or infoset) is to provide a consistent set of definitions that can be used in other specifications that refer to information held within a well-formed XML document.
Any well-formed XML document has an information set (as long as it also conforms to the XML Namespaces Recommendation). This in turn means that an XML Schema and all instance documents must be well-formed in order for them to be processed by a parser. After all, a document that is not well-formed does not have an information set.
The infoset presents an XML document's information set as a modified tree. We should be clear however, that the XML Schema Recommendation does not require that an XML Schema aware processor's interfaces make the infoset available as a tree structure Â the document may just as equally be accessed by an event-based approach (such as that implemented in SAX processors) or a query-based interface. However, the term information set can be treated as analogous to the term tree.
An XML document's information set consists of a number of information items, each of which can be treated as analogous to a node on the tree. An information item is an abstract representation of some part of a document, and each information item has a set of associated properties. At minimum, a well-formed XML document will have a document information item. There are 14 information items in all; here are the ones that we are most concerned with:
The document information item is the unique element in which all other markup is nested within a well-formed XML document. In the case of an XML Schema document, the document information item would correspond to the
An element information item exists for every element that appears in an XML document.
An attribute information item exists for each attribute, whether specified or defaulted, of each element in the document.
A character information item exists for each data character in the document, whether literally or as a character reference, or within a CDATA section. Each character is a logically separate information item, although many processing applications chunk characters into larger groups.
A namespace information item exists for each namespace that is in the scope for that element.
By talking in terms of an abstract tree representation, the schema specification can then ensure that each information item in an instance document respects the constraints imposed by the corresponding information item in the schema. This is known as local schema-validity.
There is a second level of schema validity, which represents the overall validation outcome for each item. This is where the local schema-validity of an information item corresponds with the results of the schema-validity assessments performed upon its descendents, if it has any. So, a parent element is checked against the schema-validity assessments of its child information items.
Therefore, the XML Schema Recommendation does not have to worry about how the validating processor is implemented. As long as the information items are locally schema-valid, and they correspond with child information items, an instance document will be valid. At each stage, augmentations (in the form of properties) may be added to the information items in the information set to record the outcome and help the processor achieve its task.
So, each of the components that make up any schema are used to determine whether an element or attribute in an instance document is valid. In addition, a processor may check augmentations (such as default values) placed upon those elements, attributes, and their descendents.
Created: October 25, 2001
Revised: October 25, 2001