In this chapter, we've seen some guidelines for the creation of
XML structures to hold data from existing relational databases.
We've seen that this isn't an exact science, and that many of the
decisions we will make while creating XML structures will
entirely depend on the kinds of information we wish to represent
in our documents.
If there's one point in particular we should come away with from
this chapter, it's that we need to try to represent relationships
in our XML documents with containment as much as possible. XML is
designed around the concept of containment – the DOM and XSLT
treat XML documents as trees, while SAX and SAX-based parsers treat
them as a sequence of branch begin and end events and leaf events.
The more pointing relationships we use, the more complicated the
navigation of your document will be, and the more of a performance
hit our processor will take – especially if we are using SAX or
a SAX-based parser.
We must bear in mind as we create these structures that there are usually
many XML structures that may be used to represent the same relational
database data. The techniques described in this chapter should allow
us to optimize our documents for rapid processing and minimum document
size. Using the techniques discussed in this chapter, and the next, we
should be able to easily move information between our relational database
and XML documents.
Here are the eleven rules we have defined for the development of XML
structures from relational database structures:
Rule 1: Choose the Data to Include.
Based on the business requirement the XML document will be fulfilling, decide
which tables and columns from our relational database will need to be included in
our documents.
Rule 2: Create a Root Element
Create a root element for the document. Add the root element to our DTD,
and declare any attributes of that element that are required to hold
additional semantic information (such as routing information). Root
element's names should describe their content.
Rule 3: Model the Content Tables.
Create an element in the DTD for each content table we have chosen
to model. Declare these elements as EMPTY for now.
Rule 4: Modeling Nonforeign Key Columns.
Create an attribute for each column we have chosen to include in our XML
document (except foreign key columns). These attributes should appear
in the !ATTLIST declaration of the element corresponding to the table
in which they appear. Declare each of these attributes as CDATA, and
declare it as #IMPLIED or #REQUIRED depending on whether the original
column allowed NULLS or not.
Rule 5: Add ID Attributes to the Elements.
Add an ID attribute to each of the elements we have created in our XML
structure (with the exception of the root element). Use the element name
followed by ID for the name of the new attribute, watching as always
for name collisions. Declare the attribute as type ID, and #REQUIRED.
Rule 6: Representing Lookup Tables.
For each foreign key that we have chosen to include in our XML
structures that references a lookup table:
- Create an attribute on the element representing the table in which the foreign key is found.
- Give the attribute the same name as the table referenced by the foreign key, and make it #REQUIRED if the foreign key does not allow NULLS or #IMPLIED otherwise.
- Make the attribute of the enumerated list type. The allowable values should be some human-readable form of the description column for all rows in the lookup table.
Rule 7: Adding Element Content to Root elements.
Add a child element or elements to the allowable content of the root element for each table
that models the type of information we want to represent in our document.
Rule 8: Adding Relationships through Containment.
For each relationship we have defined, if the relationship is one-to-one or
one-to-many in the direction it is being navigated, and no other relationship
leads to the child within the selected subset, then add the child element as
element content of the parent element with the appropriate cardinality.
Rule 9: Adding Relationships using IDREF/IDREFS.
Identify each relationship that is many-to-one in the direction we have
defined it, or whose child is the child in more than one relationship we
have defined. For each of these relationships, add an IDREF or IDREFS
attribute to the element on the parent side of the relationship, which
points to the ID of the element on the child side of the relationship.
Rule 10: Add Missing Elements.
For any element that is only pointed to in the structure created so far, add
that element as allowable element content of the root element. Set the cardinality
suffix of the element being added to *.
Rule 11: Remove Unwanted ID Attributes.
Remove ID attributes that are not referenced by IDREF or IDREFS attributes
elsewhere in the XML structures.