The Flesh and the Soul of Information. Abstractions for the Web | WebReference

The Flesh and the Soul of Information. Abstractions for the Web

  Abstractions for the Web

Most advances in the field of document abstractions and "separation ideology" have so far been connected with SGML.  It is also true for one of the widest and most diverse document exchange media, the World Wide Web.  Although SGML has become a core of many successful document management environments, it's not an overestimation to say that if there exists a single most important document medium craving for a high-level abstraction language, it's the Web.

Despite the obvious need, however, the first attempt to graft SGML ideology upon the Internet tree was mostly a failure.  At its early days, HTML offered purely structural markup, i.e. only one half of the equation; moreover, the structure that you were supposed to box up your documents into was very limited and not extensible.  If you add to this the scarcity of popularization attempts focusing on the ideology of separation for the Web audience, it comes to absolutely no surprise that both Web users and browser manufacturers considered the language to be a fancy plain-text equivalent of some annoyingly poor and old-fashioned word processor format.

Naturally, the development curve taken by HTML since then is in compliance with this level of understanding.  The result that we're all using nowadays is a peculiar mix of "old" structure tags and "new" presentation extensions, pioneered by browser companies and lastly acknowledged by W3C.  Thus the most basic---and presently incurable---drawback of HTML is not that it doesn't ignore presentation aspects of document, but that it stores the two inherently different types of information together, with no formal distinction between them.  As for the structural part, it is hardly suitable for serious markup challenges, as the inventory of structural tags available is very limited.

The dire need for a more consistent markup solution led to introducing XML, a W3C-standardized subset of SGML, simplified yet retaining most of its power and flexibility.  This "HTML of the future," as it is sometimes called, has already received a wide industry support, although it has yet to be adopted by the mass user audience (refer to another HTML Unleashed chapter for more information).  This development is exciting in more than one way.

Although "named styles" in word processors have to some extent accustomed document creators to the advantages of storing presentation parameters separately, this hasn't yet become a subconscious imperative: as they say, word processors allow separation but do not enforce it.  Thus, with the adoption of XML, millions of Web authors may, for the first time ever, enter the world of true structural flexibility and presentation independence.  This may lead to a boost in the average Web document quality, but it is also likely to pose serious adaptation problems for many users.

SGML's separation ideology works perfectly for well-defined types of documents whose DTDs were custom developed by trained professionals.  However, when the paradigm of structural markup hits the truly world-wide Web with its much more diversified and even erratic documents, this deceptively simplistic scheme must be rendered more concrete.  What needs to be explained to everyone is not only why XML is different---and better---than HTML, but also how it is to be used in typical, everyday document-related tasks.  Going from HTML to XML isn't like your customary software upgrade; here, the entire pattern of document processing is going to change.

In my opinion, XML creators' problem at the moment is that they're pushing along the structural side of the technology while the presentational machinery is lagging behind (although, fortunately, the lag is much smaller than that of CSS following HTML only after several years).  While XML 1.0 is already a W3C Recommendation, eXtended Stylesheet Language at the time of this writing is not even a Draft, but only a submitted Proposal.  The unwelcome result that may ensue is not, of course, that XML will be corrupted with built-in visual extensions (as was the case with HTML), but that the first wave of enthusiastic users will face an additional difficulty in embracing the new paradigm of document creation---it's indeed difficult to conceive how a "separation" is supposed to work when you're able to express only one of the two concepts being separated.


Created: Apr. 19, 1998
Revised: Apr. 19, 1998