Using XML with Legacy Business Applications: Chapter 1 | 3
Using XML with Legacy Business Applications: Chapter 1
Two Implementations of the Architecture: Java and C++
In this book I present two different implementations of our architecture. Why? Basically, for reasons of platform independence and portability. However, I’m also doing it out of a pragmatic recognition that there isn’t just one dominant programming language, and I want to be able to speak to both camps.
We can use an analogy to home building in order to understand the two implementations of a single software architecture. The analogy isn’t exact, but there are enough high-level similarities between building houses and building software to serve our purpose.
When an architect designs a house, he or she first establishes the number and types of rooms that the client wants as well as the general characteristics. Then the architect develops a detailed floor plan with approximate room dimensions, locations of windows and fixtures, and so on. At this point the architect may not have decided yet on specific materials. The actual house could be constructed out of wood, brick, or even recycled tires; the materials will be selected during a second phase of home design. The first phase of home design corresponds to software architecture. Just as we could build two houses with the same floor plans but with different materials, we can implement a software architecture in two different languages.
So, this book presents implementations in Java and C++. The Java implementation uses the Sun Microsystems and Apache Foundation XML libraries, and the C++ implementation uses Microsoft’s MSXML library.
Why Java and C++? Well, most applications these days are written in either Java
or C++. Most environments in which those applications are run use standard Java
or C++ libraries. “But,” you say, “my legacy application is
written in COBOL!” This gets to the bottom line of cheap, simple and easy.
Reading (parsing), writing (serializing), and validating XML documents are not
easy tasks. This is especially true in the case of validating against schemas
written in the W3C XML Schema language. It takes a lot of complicated, hard-to-write,
and hard-to-debug code to do such things. We don’t want to write that
code. We want to use libraries that someone else has written. And because we’re
concerned about portability and platform independence we would like to use standard
libraries rather than proprietary approaches. These standard libraries are primarily
object-oriented, which again makes them a good fit for Java and C++. In Chapter
12 I’ll talk a bit more about some other alternatives, including some
for procedural languages such as COBOL. However, our most important nonfunctional
requirements point us to standard, object-oriented application programming interface
(API) libraries.
Even with the standard API libraries there are implementation choices other
than Java and C++. For example, Perl, Python, and Visual Basic all have fairly
sophisticated XML support. Visual Basic even uses the same API library, MSXML,
that we’ll use for C++. The techniques discussed in the next two chapters
on basic conversions involving CSV files and XML could be implemented in these
languages. However, the more sophisticated conversions and utilities built later
in the book are complex enough that these languages would not be appropriate.
In addition, very few real applications are written in these languages. So,
we’ll keep our focus fairly limited and be concerned primarily with the
Java and C++ implementations. However, the basic techniques will still be relevant
if you want to cobble together some simple utilities using Perl or whatever.
As I’ll discuss shortly, the basic operations using the standard APIs
are presented in pseudocode that is language independent. Even though I won’t
offer the specific Perl or Visual Basic syntax, you could still port the techniques
fairly directly to these languages from the pseudocode design.
If you are an end user, you really don’t care much whether the program is written in Java, C++, Perl, COBOL, ALGOL, or fuggly old IBM 360 Assembler. In this case, the types of libraries and utilities you can install and run in your environment dictate your choice of implementation. Java can run pretty much anywhere, so you may want to install the relevant Java libraries and filter programs. If you can’t install things directly where your application is running, maybe you can hijack a PC networked to your application’s box and install the conversion libraries and utilities there. In that case, the WIN32 C++ implementation might be a bit easier since there are fewer pieces to install.
Before wrapping up this section there are a few other developer-oriented issues we need to discuss. We’ve already decided that we’re going to use a standard API library. Even with this restriction we still have a few choices. I have chosen to use the W3C’s Document Object Model (the DOM). But before discussing why I’m using the DOM, let’s look at what it is.
The Document Object Model
Like nearly all other basic XML technologies, the DOM is a creation of the W3C, which defines the DOM as:
a platform- and language-neutral interface that will allow programs and scripts
to dynamically access and update the content, structure and style of documents.
The document can be further processed and the results of that processing can be
incorporated back into the presented page. [W3C 2002]
What does this mean? Basically, the DOM specifies a set of program routines, or an API. It is not itself an object; but similar to how Java does things, it specifies an interface to the object model, with that interface composed of a number of different methods. Since it is just a specification, there isn’t any actual W3Cproduced code. So, in general terms, why use the DOM? Again, the W3C says:
“Dynamic HTML” is a term used by some vendors to describe the combination of
HTML, style sheets and scripts that allows documents to be animated. The W3C
has received several submissions from members companies on the way in which
the object model of HTML documents should be exposed to scripts. These
submissions do not propose any new HTML tags or style sheet technology. The
W3C DOM WG is working hard to make sure interoperable and scripting
-language neutral solutions are agreed upon. [W3C 2002]
What does this really mean? It means they developed the DOM so that application programmers can have a standardized way to deal with documents. It is also notable that the DOM was developed first to deal with HTML documents and not XML documents. However, the XML world has embraced it wholeheartedly.
The hope was that if enough tool vendors (like Microsoft, Sun, Oracle, and so on) developed DOM implementations, application developers would have a fairly easy, standardized way to deal with documents programmatically. This has come to pass, and there are enough different implementations that porting HTML or XML code between implementations is a lot easier than it would have been if there were no DOM. The DOM also makes it fairly easy to talk about coding in a generic way; most of the XML processing logic discussed in the book’s utilities is applicable to both C++ and Java environments.
The W3C has defined several different “levels” of the DOM. Level 1 was released in October 1998, Level 2 was released in November 2000, and Level 3 was still a working draft at the time of this book’s preparation. Basically, each level has a few more features and a bit more functionality than the previous version. For the most part, we don’t need to be too worried about the DOM level we’re using. Both the C++ and Java libraries used in this book support Level 2, with a few additional features from Level 3.
Why Use the DOM?
We already agreed that we don’t want to reinvent the wheel. So, from a programming perspective, the DOM fits the bill. However, it’s not the only languageindependent API we can use. There is one other predominate API, called SAX (Simple API for XML, not the musical instrument that John Coltrane and a certain past U.S. president played). SAX, like the DOM, specifies an interface that can be used to process XML documents in a somewhat platform-independent fashion. It doesn’t matter a lot in practical terms since it’s becoming customary for most XML libraries to support both, but the DOM is an “official” W3C Recommendation and SAX isn’t.
How are the DOM and SAX different, and why have I chosen the DOM? The DOM handles a complete document as an object in memory, specifically a tree. SAX, on the other hand, is an event-driven API. It calls a specific method (via callbacks) for each type of XML construct it encounters as it reads in an XML instance document. The most important difference between the two APIs is that the DOM makes it fairly easy to build or manipulate a document in memory. Some SAX implementations offer ways to build documents, but there isn’t a standard SAX approach. So, SAX is not as well suited in this respect. On the other hand, SAX is well suited to parsing very large XML documents that might cause performance problems (or even crashes) if we tried to read them into memory using the DOM. Since it’s much simpler to use just one API if possible, I’m using the DOM in this book. I’ll discuss this topic again in Chapter 12, but the choice of the DOM for this book’s utilities will become more understandable as the overall design approach progresses.
I should note, though, that while the current versions of the DOM make it fairly easy to build and manipulate XML documents in memory, through Level 2 the DOM doesn’t specify how XML (or HTML) documents are actually read or written. Curious but true. I would have thought those things to be fairly fundamental, but I guess the HTML heritage as well as different priorities in the W3C left those details to the implementers. The draft Level 3 requirements do finally deal with such things. The fact that actually reading or writing XML documents isn’t specified in Level 2 really isn’t too much of a problem since most XML libraries provide the functionality. However, because it isn’t specified there are often differences in the particular methods used. We’ll see this as we develop the C++ and Java code later in the book.
One final note on the DOM: While many implementations use the exact W3C method
and object names in their libraries, Microsoft’s MSXML library sometimes
slightly modifies the names specified by the DOM. It also allows many of the
DOM object properties (or attributes) to be accessed directly rather than just
through get methods. This means that the C++ code in this book may have to be
modified if you want to use it with a different C++ DOM library. (At least a
global search and replace will be required. Other changes will probably be required
as well, but those are beyond the scope of this book.)
Created: March 27, 2003
Revised: January 1, 2004
URL: http://webreference.com/programming/awxml2/

Find a programming school near you