Professional XML Databases | 5
Professional XML Databases
Model the Tables
Having defined our root element, the next step is to model the tables that we've chosen to include in our XML document. As we saw in the last chapter, tables map directly to elements in XML.
Loosely speaking, these tables should either be:
- Content tables, which, for our purposes, simply contain a set of records (for example, all the customer addresses for a certain company).
- Lookup tables, which contain a list of ID-description pairs that are used to further classify information, in a particular row of a table, by storing a description for each ID encountered in a content table. Tables such as ShipMethod in our example are lookup tables.
There is another type of table - a relating table - whose sole purpose is to express a many-to-many relationship between two other tables. For our purposes, we shall model a table like this as a content table.
At this stage we will only be modeling content tables. Lookup tables will actually be modeled as enumerated attributes later in the process.
For each content table that we've chosen to include from our relational database, we will need to create an element in our DTD. Applying this rule to our example, we'll add the <Invoice>, <Customer>, <Part>, <MonthlyTotal>, and other elements to our DTD:
<!ELEMENT SalesData EMPTY> <!ATTLIST SalesData Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED> <!ELEMENT Invoice EMPTY> <!ELEMENT Customer EMPTY> <!ELEMENT Part EMPTY> <!ELEMENT MonthlyTotal EMPTY> <!ELEMENT MonthlyCustomerTotal EMPTY> <!ELEMENT MonthlyPartTotal EMPTY> <!ELEMENT LineItem EMPTY>
For the moment, we will just add the element definitions to the DTD. We'll come back to ensure that they are reflected in the necessary element content models, (including those of the root element), when we model the relationships between the tables.
Note that we didn't model the ShipMethod table, because it's a lookup table. We'll handle this table in Rule 6.
Rule 3: Model the Content Tables.
Create an element in the DTD for each content table we have chosen to model. Declare these elements as EMPTY for now.
Model the Nonforeign Key Columns
Using this rule, we'll create attributes on the elements we have already defined to hold the column values from our database. In a DTD, these attributes should appear in the !ATTLIST declaration of the element corresponding to the table in which the column appears.
If a column is a foreign key joining to another table, don't include it in this rule - we'll handle foreign key columns later in the process, when we model the relationships between the elements we have created.
Declare each attribute created this way as having the type CDATA. If the column is defined in your database as not allowing NULL values, then make the corresponding attribute #REQUIRED; otherwise, make the corresponding attribute #IMPLIED.
We have four choices here. #FIXED means the DTD provides the value. #REQUIRED means it must appear in the document. #IMPLIED means that it may or may not appear in the document. Finally, a value with these means that the processor must substitute that value for the attribute if it is not provided in the document. #IMPLIED is the only way to legitimately leave off an attribute value.
If we choose to store table column values as the content of elements, rather than attributes, we can take the same approach - create an element for each data point, and add it to the content list of the element for the table in which the column appears. Use no suffix if the column does not allow nulls; or the optional suffix (?) if the column allows nulls. Be aware that if we take this approach, we'll need to be on the look out for possible name collisions between columns in different tables with the same name. This is not an issue when using attributes.
|Does the column allow NULLS?||Elements||Attributes|
|Allows NULLS||Use the ? suffix||Declare as #IMPLIED|
|Doesn't allow NULLS||Use no suffix||Declare as #REQUIRED|
For our example, remember that we want to keep all the nonforeign key columns, with the exception of the system-generated primary keys:
<!ELEMENT SalesData EMPTY> <!ATTLIST SalesData Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED> <!ELEMENT Invoice EMPTY> <!ATTLIST Invoice InvoiceNumber CDATA #REQUIRED TrackingNumber CDATA #REQUIRED OrderDate CDATA #REQUIRED ShipDate CDATA #REQUIRED> <!ELEMENT Customer EMPTY> <!ATTLIST Customer Name CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED> <!ELEMENT Part EMPTY> <!ATTLIST Part PartNumber CDATA #REQUIRED Name CDATA #REQUIRED Color CDATA #REQUIRED Size CDATA #REQUIRED> <!ELEMENT MonthlyTotal EMPTY> <!ATTLIST MonthlyTotal Month CDATA #REQUIRED Year CDATA #REQUIRED VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED> <!ELEMENT MonthlyCustomerTotal EMPTY> <!ATTLIST MonthlyCustomerTotal VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED> <!ELEMENT MonthlyPartTotal EMPTY> <!ATTLIST MonthlyPartTotal VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED> <!ELEMENT LineItem EMPTY> <!ATTLIST LineItem Quantity CDATA #REQUIRED Price CDATA #REQUIRED>
Note that we left off Month and Year on the <MonthlyPartTotal> and <MonthlySummaryTotal> structures, since these will be dictated by the <MonthlyTotal> element associated with these elements.
Rule 4: Modeling Nonforeign Key Columns.
Create an attribute for each column we have chosen to include in our XML document (except foreign key columns). These attributes should appear in the !ATTLIST declaration of the element corresponding to the table in which they appear. Declare each of these attributes as CDATA, and declare it as #IMPLIED or #REQUIRED depending on whether the original column allowed nulls or not.
Created: May 09, 2001
Revised: May 09, 2001