The Art & Science of Web Design | 8
The Art & Science of Web Design
Let's start with HTML as our basis for discussing structure. We've already seen where it came from-humble beginnings in early database systems and its evolution through SGML. And we've seen why its goals of simplicity and forgiveness made it so rapidly popular. But how can something so pervasive come from something so simple?
The answer lies in the basic building block of the Web: text. As far back as you look in the history of the Web, plain old text has been the lingua franca. I'm referring to the simple .txt files on your computer-like the READMEs that come with new software (also, as a matter of fact, the format of the HTML files we use to build our Web sites). But now, with all our modern applications and emphasis on graphics and visuals, isn't text outdated? For example:
- Text is visually limiting. Think about it: How many stunning presentations have you witnessed? And how many of them were done by someone standing in front of a video projector showing an ASCII-text document? Words may be the fundamental piece of communication, but visual design can't be discounted for its emotional impact. Plain text just doesn't cut it.
- Text is not engaging. Look beyond graphic design to multimedia-streaming audio, video, and the interactivity of other binary objects like Flash animation and Java. Text may do a fine job describing things, but at some point you are probably going to want to show what it is you're talking about. That's the point where you leave text behind.
- Text is not quite universal. Ever wonder what the acronym ASCII actually stands for? "American Standard Code for Information Interchange." That's right, our universal text format-shared by computers around the globe-is based on an American standard. Ever try to do Kanji in a text doc? Good luck.
So why the emphasis on text? Again, there are a few reasons:
- Text is (sort of) universal. As I just mentioned, ASCII may come from just one country, but the fact remains that virtually every computer system in the world is capable of understanding a .txt file in a pretty fundamental way. Some day, ASCII will be replaced with UNICODE, a system for encoding tens of thousands of international characters into text files. But for now, at least we can exchange basic documents with virtually anyone in the world.
- Text is fast. The bytes you find in a text document are about as stripped down as possible. Compare a text file to a heavily formatted Microsoft Word document, and the size difference will be hefty. Compare a text file to a streaming video file, and you'll start to see orders of magnitude.
- Text is machine-readable. This is the key. The contents of a text file can be read into a computer, and they can easily be "understood" for the words that they are. Think about spell-checking a file in a word processor. How does the computer know which words to flag? Simple pattern matching on the values of the characters it finds in the document. Compare that to the computationally intensive task of, say, recognizing the words in an audio file. You could do it, but it would be a lot harder than just zipping through a text file.
Thus, the fact that HTML is derived from plain text means that it inherits all the computer-enabled benefits of ASCII. Computers can manipulate the text. We can create programs to do all sorts of wonderful things to our content: We can index it and search it, we can translate it into other languages, and we can copy and paste it. The possibilities are, quite literally, endless.
None of these things are possible when you leave text behind. In traditional print design, for example, it is not uncommon to take text from a layout program like QuarkXPress and drop it into a graphics application like Photoshop. By turning the text into a graphic, designers can manipulate it all they want to achieve the desired effect. They can stretch and rotate and embellish until a headline or drop cap is perfect, and then import it back into their documents. But what if we do this on the Web? The words in the headline, as a graphic, lose their meaning. The computer can no longer distinguish them as words-it sees only a graphic. The machine-readable benefits of text are gone.
With a foundation of plain text, HTML takes it a step further into structured text. If machine readability is an admirable goal, then structure applied to simple text is the proverbial Holy Grail. Think about it: If a computer can process a file, adding structure by means of tags can provide clues to what that text actually means. For example, take the following bit of text:
The story was about Microsoft and Bill Gates.
What can a computer do with the line above? Well, as we've seen, it can do any number of transformations. It can be spell-checked, searched, translated, converted to capital letters, or printed in green. But consider the following:
<p>The story was about <company website="<a href="http://microsoft.com"">http://microsoft.com"</a> symbol="MSFT">Microsoft</company> and <person title="President" employer="Microsoft">Bill Gates</person>.</p>
Now consider how easy it would be to programmatically manipulate the text. Not only can I do all the things we could do to the previous example, but I can add even more value. I can look up the current stock price of the company mentioned. I can build a link to the company's home page on the Web. I can link to any biographical information I may have on Mr. Gates. I can search this text, and any other text we have, and aggregate all the officers of public companies. And the list goes on and on.
We've just added a very powerful feature to our text-something called metadata, or information about information. The metadata in the tags is not intended to be displayed as part of the sentence but rather as embellishment and annotation of the sentence. It is adding value. It is allowing us to reference parts of our content.
These are structural tags. They talk about the semantics of a document and add metadata so that we can manipulate our content. Others, purely presentational tags, offer none of these benefits. Think for a second, about the difference between these two examples:
The story was about <b>Microsoft</b> and Bill Gates.
The story was about <company>Microsoft</company> _and Bill Gates.
Which is more valuable? Obviously, the second allows us far more opportunity to disambiguate the content. The <b> tag may render the company's name in boldface type, but it tells us nothing about the content. The <company> tag, on the other hand, gives us a clear idea of what is being referenced, but says nothing about how our browser should display the word. Wouldn't it be great if we could get the best of both worlds, adding rich metadata while maintaining control of the visual presentation?
Luckily, that is exactly how HTML was designed.
Translating the Web with Babelfish
It can be tempting to bypass the limitations of HTML for the visually stunning impact of graphics. By imprisoning parts of your pages as graphics, you can achieve a variety of effects beyond the rather rudimentary capabilities of today's browsers. Headlines can come alive in any typeface you desire. Text can rotate and show off drop shadows, and on and on and on.
But is it really such a good idea?
For a perfectly clear example of the power of text, we can turn to the Alta Vista Search Engine. One of the interesting features the service offers is the capability to translate Web pages into other languages. Thus, if you find an interesting looking page written in Spanish (and you don't happen to habla EspaÃ±ol), you can let the Babelfish translator convert it to English.
That is, if the page is actually still text. The engine can't get to the words found in graphics, so all those fancy headlines are going to stay elusive. Bummer, considering that's often the most important content on the page. And those sites that create their content as a graphic or Flash animation? Well, you're completely out of luck. The Alta Vista translation service, Babelfish, will convert Web pages between a number of different languages... if it can read them.
Created: April 5, 2001
Revised: April 5, 2001