| home / programming / awxml2 / 1 | [previous][next] |
|
|
Remember the old joke about good, fast, and cheap, that you can pick only two out of the three? Well, with any luck we might be able to get you all three. Good is pretty relative (as well as unspecific), so I won’t go into a lot of detail here. The same applies for fast. So let’s talk about cheap and a few other typical requirements. If you’re an end user and you’re paying a relatively small price for this book instead of buying a full-featured Enterprise Application Integration (EAI) system like Mercator, we understand each other. If you find the word cheap a bit harsh, think of it this way: You can’t cost-justify purchasing a full-featured package to convert a few files on an infrequent basis. You can’t justify spending more on utilities than you did on the business application. Take your pick, but cheap is fine with me. If you’re a developer, you want to support XML without spending the next release’s entire new features budget. Return on investment is what it’s all about. The bottom line is the bottom line.
So, beyond cheap, what’s important? Simple usually goes hand in hand with cheap, as does easy. Beyond these biggies, there are several more that developers and information technology staff usually care about. I’ll assume that you do too.
However, we do need to draw the line somewhere and note some things that are of lesser importance. In this book I assume that you don’t care as much about the following requirements.
Summing it all up into just a few words, in this book I present a fairly simple, pragmatic approach that can be implemented relatively quickly and at fairly low cost.
At the heart of this approach are a number of independent programs that can be used alone or together to build a solution appropriate for the specific problem you need to solve. The programs we will develop in this book are:
In addition, we’ll rely on an XSLT transformation program to transform between different XML formats. XSLT is the W3C recommendation on Extensible Stylesheet Language Transformations. The good news is that such XSLT utilities are easily (and freely) available, and we’ll use an existing one rather than build one. I will present the essential techniques you need to know to code XSLT stylesheets that drive the transformations between different XML formats.
How does the solution measure up so far against our requirements? As you can easily see, it meets all the functional requirements for technical end users. Because I’ll walk you through the coding techniques used in each of the programs, it should help developers meet their requirements too.
Cheap, simple, and easy? You bet. As I said before, you can download some pretty decent XSLT transformation programs for free from the Web. You can download executable versions of the programs developed in this book from the book’s Web site. If you want to tweak them a bit or even use them as a basis for your own slightly different programs, you can also download the source code. The solution is very modular, since we use a number of independent programs. It is also maintainable because most of what you have to change to deal with different file organizations is done in XML files rather than in code. The XML to XML transformations are driven entirely by XSLT stylesheets, which are themselves nothing more than XML documents in the XSLT language. XSLT also offers a great deal of portability (as long as you don’t use nonstandard extensions!). Other aspects of portability and platform independence are determined more by the specific implementation, so I’ll discuss those later.
According to the seminal book on software architecture, written by Shaw and Garlan [1996], software architecture “involves the description of elements from which systems are built, interactions among those elements, patterns that guide their composition, and constraints on these patterns. In general, a particular system is defined in terms of a collection of components and interactions among those components.” What we’re most concerned with here is what Shaw and Garlan refer to as the architectural pattern or style.
Style is the general model of how the system is composed and how the different parts interact with each other. You’ve probably run across object-oriented, layered, and client-server styles of software architecture. We’ll use a style that should be familiar to you (especially if you do a lot of work from a UNIX shell prompt) but that you may not know by the name used here: pipe and filter.
The basic idea is that you have a number of different programs, or filters, that read input in one format, transform it, and produce output in a different format. The output of one filter is connected to the input of another via a pipe. The filters are independent of each other, with only the constraint that the output of one filter be useable as the input of the next. In our approach, the pipes are disk files or sets of files in directories, but they could just as easily be in-memory data structures, database tables, or something else. Figure 1.1 shows the basic pipe and filter style.
A modified pipe and filter style (Figure 1.2) allows the filters to accept input or parameters other than just the input files. This allows the filter programs to produce different types of output. In reality, most pipe and filter systems in use are of this modified style; ours is too.
Want to build a system that converts CSV to XML? Easy—just use the CSV to XML Converter program as the filter (Figure 1.3).
You don’t like the XML that this produces? You want to produce an XML file that conforms to your customer’s invoice specification? No problem. Add the XSLT transformer to build a more general purpose application (Figure 1.4).
Now let’s say you have a customer (or maybe you’re a consultant and your client has a customer) who wants to send you orders using the X12 EDI standard. The version of Peachtree Accounting you’re using processes only CSV files, and you don’t want to buy an EDI system for just this one application. We can easily build a system to solve this problem too (Figure 1.5).

Figure 1.1 Basic Pipe and Filter Style

Figure 1.2 Modified Pipe and Filter Style

Figure 1.3 CSV to XML Application

Figure 1.4 CSV to XML to XML Application
By now it should be pretty obvious that you can take the building blocks here and make pretty much whatever pipe and filter system you might need. Once you think about it, the pipe and filter style is a very logical choice for the type of format conversion problems we’re dealing with in this book. Even if you’re a developer who wants to enable your product to work with XML, knowing that the XML your application reads or writes is probably going to live in this type of environment can help you make some better-informed design decisions.
Beyond our basic pipe and filter approach, it would be nice to have something that helps us build these specialized little pipe and filter applications. In Chapter 11 we’ll put a lot of the pieces together and go over just how such a system might look.
There are two different opinions about using XSLT for non-XML transformations where the source or target is a format other than XML. One camp says that XSLT can and should be used for everything. The other camp says that XSLT is wonderful for transforming an XML document into a different flavor of XML document but that there are better ways to deal with non-XML formats.
So, why am I in the latter camp? It’s an issue of using the right tool
for the job. If you’re camping and you forget an essential tool, you can
open a can of beans with a hatchet or maybe even cut down a tree with your Swiss
Army knife. But those aren’t exactly the easiest ways to do things. It’s
the same way with XSLT. XSLT has a fairly concise language for identifying items
in source XML documents and creating corresponding items in target XML documents.
However, when dealing with other formats the language is not so elegant. For
example, to

parse an input fixed record length file it is necessary to use things like substring
functions. Parsing an input CSV file requires repeated use of a search function
to find the delimiter and then substring functions to isolate individual fields.
Creating target documents in non-XML formats requires similar operations.
There are some drawbacks to using XSLT for everything. The most efficient design approach using just XSLT is to try to do everything with one XSLT stylesheet. In this approach the code that parses the input or formats the output is mixed up with the code that does the logical transformation between documents. By “logical” I mean correlating input fields to output fields, along with any manipulation or value conversions required. One example of a logical transformation is concatenating a person’s first and last names into a single customer name. This approach of trying to do everything with a single stylesheet doesn’t support modularity, reusability, understandability, or maintainability very well.
A less efficient approach that is somewhat more modular and reusable is to use two XSLT transformations. One transforms the non-XML format to or from XML, and the other performs the logical transformation of XML to XML. This approach is somewhat better than a single XSLT stylesheet that does everything. However, it is still hampered by a limitation in the XSLT 1.0 language that restricts it to generating just one output file per transformation. As we’ll see in later chapters, several types of transformations require us to generate multiple XML documents from one non-XML input file. Some XSLT 1.0 processors include nonstandard functions that provide this functionality. There is discussion of supporting it in XSLT 1.1, but it isn’t there yet as a standard. In addition, for more complex operations like dealing with EDI acknowledgments and keeping track of the documents exchanged with different trading partners, it is much more appropriate to use a programming language like Java, C, or C++ than XSLT.
An advantage of using a more conventional programming language for handling non-XML files is that we can implement a general purpose solution rather than a series of XSLT stylesheets that are each useful for only one particular file organization. In our general purpose solution fairly simple parameter files handle the differences in file organization. These files are themselves coded as simple XML documents.
In addition to these considerations, presenting the techniques for handling non-XML files with XML files is a good introduction to the type of XML programming that will be required to XML-enable legacy business applications. In other words, it’s a good teaching tool.
I should mention one final caveat regarding XSLT. There are, of course, wide
variations among different XSLT processors, but as a group they tend to have
poorer performance than commercial EAI or EDI conversion utilities. XSLT, and
therefore the architecture, is probably not well suited to production situations
that need to handle large amounts of data in a short amount of time. However,
earlier in the chapter I noted that in this book I assume that cost and portability
are more important considerations than performance. XSLT fits the bill.
| home / programming / awxml2 / 1 | [previous][next] |
Created: March 27, 2003
Revised: January 1, 2004
URL: http://webreference.com/programming/awxml2/