The Art & Science of Web Design | 4 | WebReference

The Art & Science of Web Design | 4

To page 1current pageTo page 3To page 4
[previous] [next]

The Art & Science of Web Design

The History of Electronic Text

Historically, when a printed manuscript was given to a copy editor for grammatical and formatting edits, the process would include something called "markup." In the case of, say, a turn-of-the-century newspaper, an editor would scribble codes in the margins of a particular story that described how it should look. Then the codes were interpreted by a typesetter (the person who was responsible for putting together the final page on the press). Headlines, for example, would be marked with a shorthand notation describing which typographical convention to use. Thus, the editor might write something like "TR36b/c" and point to the first line of text on a page, effectively telling the typesetter to set that line as a headline in Times Roman 36 point bold and centered.

Most publications, however, defined standards for each individual part of a story and page. That way, the editor wouldn't need to write the same typographic codes again and again. Instead, each page element could simply be specified by name. Not only did this save time, but it ensured consistency across a publication. A newspaper, for example, might have defined six different headline weights to correspond to a story's position on a page. The paper's editor could save time when doing the layout by tagging a story's first line of text with a standard notation like "HEAD3". A typesetter, encountering the notation, would look up the code on a sheet listing the style standards, and format the headline accordingly. This process is known as indirection-a concept that would eventually find its way into all aspects of publishing as well as disciplines like computer science.

Early computer word-processing applications followed a similar evolution. Much like copyeditors adding formatting codes, these tools processed text with specific markup. The user was able to denote text with instructions that would describe how the text should be presented: whether bold, italic, big, or small.

While this may have been fairly interesting in an abstract historical context, it was ground-breaking to the handful of researchers like Charles Goldfarb in the late 1960s. They began to realize that using typographical conventions in word processors was shortsighted. Rather, they believed electronic text should be tagged with general markup, which would give meaning to page elements much like the markup codes traditionally shared between editors and typesetters. By separating the presentation of a document from its basic structural content, the electronic text was no longer locked into one static visual design.

Charles experimented with storing his electronic legal briefs in pieces, and labeling each piece of the brief based on what they were, rather than what they should look like. Now, instead of marking a chunk of text as being 36pt Times Roman, he could simply label it as "Title." The same could be done for every other chunk in the document: author, date published, abstract, and so forth. When thousands of briefs had been marked up with standard tags, you could start to do some amazing things such as grouping summaries of briefs written by a particular lawyer, or collapsing a document down to a simple outline form. Then, when you were satisfied with the final brief, you could print the document by specifying a style sheet much like editors and typographers did decades ago. Each tag was assigned a particular formatting style, and the document was produced in a physical form. Updating, redesigning, and republishing was a breeze. Charles was no longer bored. Technology and publishing had intersected in a remarkably powerful way.

Charles Goldfarb continued his work at IBM into the early 1970s with Edward Mosher and Ray Lorie. As they researched their integrated law office information systems, they developed a system of encoding information about a document's structure by using a set of tags. These tags followed the same basic philosophy of representing the meaning of individual elements, with the presentation then applied to structural elements rather than the individual words. The team started to abstract the idea. Rather than develop a standard set of tags, why not just set up the basic rules for tagging documents? Then every document could be tagged based on its own unique characteristics, but the searching, styling, and publishing of these documents could all be done with the same software, regardless of whether you were sending out legal briefs or pages of a newspaper. They dubbed their system the Generalized Markup Language, or GML (which, incidentally, also encoded the initials of the inventors for posterity).

And here's the interesting part: GML was developed so it could be shared by all electronic text. If there was a standard method for encoding content-the reasoning went, then any computer could read any document. The value of a system like this would grow exponentially.

The concept quickly spread from the confines of IBM. The publishing community realized that by truly standardizing the methodology of GML, publishing systems worldwide could be developed around the same core ideas. For years, researchers toiled over the best way to achieve these goals, and by the mid-1980s, the Standard Generalized Markup Language, or SGML, was finished. The resulting specification, known to the world as ISO 8879, is still in use today.

SGML successfully took the ideas incorporated into GML much further. Tags could go far beyond simple typographic formatting controls. They could be used to trigger elaborate programs that performed all sorts of advanced behaviors. For example, if the title of a book was tagged with a <book> tag, an SGML system could do much more than simply make the text italic. The book tag could trigger code in the publishing system to look up an ISBN number, and then create a bibliographic reference including the author, publisher, and other information. SGML could also be used to generate compound documents, which are electronic documents that are pulled together automatically from a number of different sources. A document no longer needed to be a collection of paragraphs, but could include references to information in a database that could be formatted on the fly. Consider the statistics on the sports page of a newspaper; raw data flows through formatting rules to automatically generate the daily page; or imagine a catalog that always printed the current prices and inventory data from a warehouse. Electronic publishing began to come of age.

As a standard, SGML was a remarkable accomplishment. Getting thousands of companies, organizations, and institutions to agree on a systematic way of encoding electronic documents was revolutionary. The problem, however, was that in order to be universally inclusive, SGML ended up being massively complicated. So complicated, in fact, that the only real uses of the language were the largest constituents of the standards group: IBM, the Department of Defense, and other cultivators of massive electronic libraries. SGML was a long way away from the desktops of emerging personal computers at the time.


To page 1current pageTo page 3To page 4
[previous] [next]

Revised: March 20, 2001
Created: March 20, 2001