Professional XML Databases | 4 | WebReference

Professional XML Databases | 4

To page 1current page

Professional XML Databases

Scoping the XML Document

The first rule when designing an XML structure to hold relational information is to decide what the scope of the document is. The scope refers to the data and relationships that we want to reproduce when creating our XML document - after all, when exposing the database content, we may not need all of the data that the database stores.

If we think about executing a query against a database, we may only require a subset of the information that it holds. For example, an e-commerce site stores data with relationships that model everything the customer has bought in the past, as well as current orders being processed. If we were writing a CRM application, we would not necessarily need to retrieve all of their past purchases - only those that had recently been placed.

In short, the scope of the document that we are creating is driven by business requirements - what the data is going to be used for, and how it is going to be used - and these business requirements may vary widely.

For example, our business requirement could be to transmit information to our accounting office about summarizing the monthly invoice totals, as well as a customer-by-customer breakdown so that billing may be performed. In this case, we may want to send only a certain subset of the information to our accounting office (the shaded tables):


An alternative business requirement might be to transmit an XML copy of an invoice to a customer each time a new invoice is submitted, in which case the subset of the information we would be transmitting might look like this:


Additionally, we might want to control the specific columns that are transmitted. For example, say our customer wanted to query a product they had ordered; they have their invoice number to identify their purchase, but they aren't necessarily going to care about the invoice tracking number that our application uses internally. The extra number may in fact confuse them more.

By identifying the specific set of tables and columns that are going to be transmitted, we can start to get a feel for how the XML document needs to be laid out. If we happen to have access to a logical data diagram of the database, such as an ErWIN model, it can also be very helpful when constructing our XML.

Rule 1: Choose the Data to Include.

Based on the business requirement the XML document will be fulfilling, decide which tables and columns from our relational database will need to be included in our documents. For the purposes of our example, we'll assume that all the information in our structure is relevant to the process (with the exception of the system-generated keys, which we can discard).

Creating the Root Element

Once we've clarified the scope of the document that we need to create, which may be driven by business needs, we need to create the root element within which our XML representation of the data is nested.

For our example, we'll create a root element called <SalesData> to hold the other elements we will create:

...other elements go here

It's also possible that we may want to add some information to our XML document that isn't part of our relational database. This information might be used to indicate transmittal, routing, or behavioral information. For example, we might want to add a source attribute, so that the consuming process can decide which custom handler needs to be run to parse the document being passed. If we choose to add information about the document like this, it makes the most sense to add it as attributes of the root element we create. As we'll see in Chapter 18, many of the emergent XML servers (such as BizTalk) provide just such a mechanism, known as the envelope.

For our example, we'll add an attribute to our root element, to govern what the consuming processor should do with the document when it is received. Specifically, we'll add a Status attribute. This attribute will let the processor know whether the information in the document is new, an update to existing data, or a courtesy copy.

So far then, we have the following structure:

<!ATTLIST SalesData 
  Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED>
Rule 2: Create a Root Element

Create a root element for the document. Add the root element to our DTD, and declare any attributes of that element that are required to hold additional semantic information (such as routing information). Root element's names should describe their content.

For the purposes of our example, we'll assume that all the information in our structure is relevant to the process (with the exception of the system-generated keys, which we can discard).

To page 1current page

Created: April 30, 2001
Revised: April 30, 2001