XML, the better HTML? (1/4) - exploring XML
XML, the better HTML?
Let's face it: HTML has already passed the logical point for its further
development. The W3C will finally declare version 4.0 the last in history, after
a serious of questionable innovations, all the way from the <blink> tag in
version 2, over frames in 3.2, to more nonsense in 4.0. While HTML has brought a
tidal wave of information sharing to the world via the Web, it brought Web
developers nothing but grief: Bloated browsers with incompatible implementations
of slow and sometimes silly standards. But fear not, XML comes to the rescue -
or, then again, not?
So is XML better?
To be honest this question has no real answer: XML is a meta language, meaning a language for defining other languages, while HTML by itself is a more or less well-defined language. XML stands for eXtensible Markup Language, which is actually a bit of a misnomer as it should actually mean extensible Meta Language. The easiest way to understand the difference is to note that XML by itself does not define any tags, it only describes a way of defining your own set of tags and attributes, hence the name extensible. HTML in contrast has a fixed set of tags, and their meaning is defined in the W3C standards specifications or the implementation of a particular browser, whichever came first. So in directly comparing XML and HTML one would compare apples and oranges.
So what is wrong with HTML?
Let's look at a specific example. The following fragment of HTML shows a listing of a shopping cart containing two products, as it might appear on any of your favorite shopping Web sites:
<HTML> <BODY> <H3>Your shopping cart contains:</H3> <TABLE BORDER="1"> <TR BGCOLOR="#FF0000"> <TH>Article</TH> <TH>Qty</TH> <TH>Price</TH> </TR> <TR> <TD>Pen</TD> <TD>3</TD> <TD>3.99</TD> </TR> <TR> <TD>Pencil</TD> <TD>2</TD> <TD>2.95</TD> </TR> </TABLE> </BODY> </HTML>
There are many good things to be said about HTML:
- It is relatively simple, therefore quick and easy to learn.
- It is now pervasive in the Internet space, not least because of its simplicity.
- It can be viewed with minimal client requirements, namely a browser.
- It is well suited for describing the visual appearance of a human-readable document, including text and images.
But there are also some shortcomings:
- It mixes data structure, e.g. the articles in the shopping cart in our example above, with presentation instructions such as a table border of one pixel width, and a row background color of red.
- It does not identify the data elements, so the information that pens and pencils are articles, and that 3.99 and 2.95 are their respective prices, is lost.
- It uses a fixed set of well-defined tags, it is not extensible to allow for user-defined tags. There are no tags for describing vector graphics in HTML, for example.
- It is good enough for humans, i.e. to be displayed in a browser, but not good enough for use by machines: Could you easily write a program to calculate the total order price for our shopping cart above?
Created: Nov. 20, 1999
Revised: Dec. 09, 1999