Using XML with Legacy Business Applications: Chapter 1 | 6
Using XML with Legacy Business Applications: Chapter 1
CAUTION Don’t Change Horses in the Middle of a Stream!
I feel a bit of a nag when reminding people to pay attention to the basics. But it seems that we often forget them or make exceptions when dealing with new technology. After you have downloaded the latest tools and APIs, don’t update them until you’ve finished your project! With updates seeming to come every month or two in the XML world, there is a great temptation to apply them just as soon as they come out. However, I’ve had too many experiences on other projects over the years of starting to work in the morning and discovering that code that worked last night doesn’t work today, even though I didn’t change anything. After hours of debugging I found that new libraries had been installed and the problem wasn’t my code at all. Save the updates of the APIs until your code is stable and thoroughly tested. Then download the latest, rebuild, and retest.
Java Software
The Java code in this book was developed using the libraries listed below. It may compile and run under later versions, but it hasn’t been tested that way. Due to changes in support for the W3C XML Schema language, the versions listed are the minimum that I recommend.
-
Java 2 Standard Edition Version 1.3.1_03 (http://java.sun.com/j2se/1.3):
To compile the code you’ll need the full software development kit (SDK).
If you’re only going to download and run the book utilities, you’ll
just need the Java Runtime Environment (JRE).
Java XML Pack, Spring 02 (http://java.sun.com/xml/downloads): If you’re only going to run the Java utilities, you don’t need the full kit. However, you will need the Java jar archive files from the kit.
Xerces2 Java Parser 2.0.1 (http://xml.apache.org): This software is required for all Java XML development and runtime presented in this book. For the most part I use standard Java API for XML Processing (JAXP) features, but the routines to serialize a DOM document use specific Xerces2 routines. (Although the precise name of the parser is “Xerces2,” I’ll observe the common custom of referring to it as just “Xerces” unless I mean to refer specifically to this version of Xerces.)
Xalan-Java 2.3.1 (http://xml.apache.org): This software is required for Java-based XSLT transformations. Other Java-based XSLT implementations may work but have not been tested.
Refer to my Web site for specific instructions about how to install the jar
files for your particular version of Java.
Note: The book’s Java code was developed using version 1.2 of Sun’s
JAXP. The Java XML Pack has an implementation of the JAXP 1.2 specification
that uses Xerces2 as the default parser. (Xerces2 conforms to the JAXP 1.1 specification.)
To be sure that you get the latest bug fixes I recommend downloading the most
recent version of Xerces2 from the Apache site.
The Java code was developed and tested with Borland’s JBuilder Version 6 on Windows NT Workstation version 4. It was also tested on Windows 98.
C++ Software
As previously mentioned, the C++ implementation presented here is based on Microsoft’s MSXML library, listed below. The code was developed and tested with MSXML version 4.0. This is the minimum version required. The code may work with later versions but has not been tested. Changes will be required to make it run with different DOM implementations.
-
MSXML 4.0 Microsoft XML Core Services (http://www.microsoft.com/
downloads): This is required for both development and runtime. MSXML runs
as a COM service.
MSXSL.EXE Command Line Transformation Utility (no version number, last updated 9/10/2001; http://msdn.microsoft.com/downloads). This is required for runtime XSLT transformations from command line.
The C++ code in this book was developed under Visual C++ 6.0 and tested on Windows NT Workstation 4.0 and Windows 98. (No, you don’t need .NET!)
MXSML, Win32, and COM as Legacy Technologies?
Incredible but true! When I conceived this project, MSXML 4.0 (with full support for the final W3C XML Schema Recommendation) was still pretty new. However, before I really got going, Microsoft came out with .NET. Not only does .NET offer a different DOM API than MSXML (though reportedly it still uses MSXML under the hood), it is a completely new and different framework. XML is only a small piece.
When this happened I became a bit concerned that working with the DOM via MSXML’s COM interfaces would be dealing with obsolete technology. However, on reflection I decided that it fit very well with the thrust of the book. I just didn’t expect to run into legacy applications and technology in this particular area! Considering the paradigm shift that .NET imposes and the impact of migrating from Visual Studio 6.0 to Visual Studio .NET, I think there may be an audience for this book for some time to come.
Bottom line: This book gives you a way to do XML from Win32 without having to migrate to .NET.
Alternatives to MSXML?
While struggling to get the COM-related stuff working and fully debugged, I began to question my decision to use MSXML instead of some other DOM implementation that worked with C++. But, given the realities of C++ applications on Win32, I still think that MSXML is the best choice for this book. Many development shops are all Microsoft and wouldn’t look at APIs from any other source. MSXML is fairly mature, installs easily, and is widely considered to offer some of the best support for W3C XML Schema. However, if you are comfortable with the licensing terms of the Apache C++ Xerces implementation, you can save yourself a lot of the COM headaches by using it instead of MSXML.
~ For Developers
This section describes the coding approach, conventions, and style I chose to use in this book. If you’re reading this book as a technical end user, you may want to skip ahead to the next section.
General Coding Approach and Conventions
The Java and C++ code presented in this book uses object-oriented techniques. The DOM and all its various parts are object-oriented, as are the Java file operations and the C++ file operations. (I have chosen to use the C++ classes instead of the oldstyle C libraries.) That said, most of what we programmers really care about is procedural in nature, that is, the code that lives and works inside methods. We care most about how to call methods that manipulate DOM objects. We don’t care as much about everything else.
I have not gone out of my way to make this code object-oriented. In the beginning it is fairly simple and not necessarily very heavily object-oriented. However, as the design progresses and matures through the book, I do use more object-oriented techniques when they promote reuse and extension. I’m enough of an old-school programmer to be a bit concerned about the performance implications of declaring and freeing a lot of objects dynamically. I generally tend to avoid doing so when I can. However, we’re valuing reusability and extensibility over performance in this design. So, there are cases when we do create a lot of objects dynamically at runtime rather than declaring them statically at compile time. If the code lives on and it turns out to be a dog at runtime, we can investigate more efficient designs. A modular, object-oriented approach also helps us in that regard.
I have not chosen to construct elaborate object models for the non-XML entities manipulated in these programs. In addition there aren’t very many class diagrams or other things as found in the Unified Modeling Language (UML). We’re going to keep it simple and focus on the essentials.
That said, here are a few other notes and general rules on coding approach and style.
Clarity over cleverness: As any seasoned programmer knows, there are ways to write programs that are exceedingly clever. Such cleverness often yields short, efficient programs. However, in many cases these come at the expense of clarity, since it sometimes makes it harder to understand what the program is really doing. In this book where being clear means using a bit more code at the expense of being as clever as possible, I do the former rather than the latter. You can be clever in your own programs if you want, but in trying to get across some basic concepts I would rather be clear. We will follow the KISS principle: Keep it simple, stupid!
Error handling: Since these are basic utilities intended for light use and teaching concepts, I did not implement elaborate error handling. However, all parsing and DOM exceptions are caught and enough information is reported to enable someone reasonably experienced in XML to fix a problem.
DOM extensions: Some DOM implementations, notably Microsoft’s MSXML in our case, offer extensions to the methods, functions, and interfaces defined in the DOM. To try to make the code platform independent and to keep the Java and C++ implementations as alike as possible, I avoid using these extensions in the code. However, where the extensions offer alternatives to the approaches I take in the code, I often comment on the alternative in the text.
File organization: Each class is coded in its own source file. The name of the source file generally matches the name of the class. This is, of course, standard in Java but not in C++.
Header files: For C++, each class has a separate header file.
Naming conventions: Upper camel case is used in the XML examples and for class names in the programs. Lower camel case is used for method names and variables. These choices seem to follow the prevailing conventions. With few exceptions abbreviations are avoided. Variable names in programs are prefixed with an abbreviation indicating the data type. Although in Java and C++ the line between variables and classes is sometimes hard to determine, I won’t use this prefixing convention for classes.
Comments: The code is liberally commented. I have tried to err on the side of having too many comments rather than not enough. I hope you find them sufficient without being too tedious and obvious.
Formatting: Opening and closing brackets for code blocks appear on their own lines. Lines generally break around column 65. I have tried to make indentation and continuations consistent in the source files, using spaces and no tabs.
Pseudocode: The pseudocode presented here is based on Programming Design Language (PDL) [Caine and Gordon 1975]. This commonly used pseudocode detail design language is basically structured English with a few reserved words. I see no particular merits in it over other pseudocode languages, but I picked it to lend some consistency to the pseudocode. My usage extends PDL by including calls to specific DOM methods. Appendix B presents a summary of pseudocode conventions used in this book.
Created: March 27, 2003
Revised: January 1, 2004
URL: http://webreference.com/programming/awxml2/

Find a programming school near you