What's in a topic map? (2/2) - exploring XML
What's in a topic map?
A topic can be any "thing," a person, entity, concept, etc. Check the index of your favorite book for an example. SO far, we've used "W3C" and "XML" as examples of topics discussed in this column. Topics can be categorized according to their kind. In a topic map, any given topic is an instance of zero or more topic types. This corresponds to the categorization inherent in the use of multiple indexes in a book (index of XML standards, index of Web browser software, etc.). For instance, XML is a W3C standard, W3C is a committee.
For convenient reference a topic can have one or more names, such as "XML" and "Extensible Markup Language", although unnamed topics are theoretically possible. Sources of different names for the same topic can be the use of acronyms, different languages, or synonyms, which usually come from foreign language influences, whether recently or anciently.
A topic may be linked to one or more information resources that are deemed to be relevant to the topic in some way. Such resources are called occurrences of the topic. An occurrence could be an article about the topic on a Web site, a picture or video depicting the topic, a simple mention of the topic in the context of something else, or a commentary on the topic, e.g. an XML standard
Such occurrences are generally external to the topic map document itself, and they are "pointed at" using whatever locator mechanisms the system supports, for instance URIs in (XTM) XML Topic Maps. Today, most systems for creating hand-crafted indexes (as opposed to full text indexes) use some form of embedded markup in the document to be indexed. One of the advantages to using topic maps, is that the documents themselves do not have to be altered.
Occurrences have roles attached to them, like "article", "illustration", "mention", and "commentary" outlined above. Just like topics, roles can have types that codify their nature. For instance the role "article" could be of type "document". In an environment where "document" is well-defined, further information could be derived from this fact, such as asking for a creation or last-modify date.
A topic association describes a relationship between two or more topics. The obvious example is the "Column77 was produced by Michael" association between a document and its author. Associations, like topics and occurences, have types, such as "produced_by" and "included_in". Last but not least, every topic participating in an association plays a certain role in that association. Naturally, association roles can also be typed, and are also a topic.
From Print to Web
Up until now there has been no equivalent of the traditional back-of-book index in the world of electronic information. People have marked up keywords in their word processing documents and used these to generate indexes "automatically", but the resulting indexes have remained as single documents. The World Wide Web removes the distinction between individual documents and now, indexes have to span multiple documents. Indexes have to cover vast pools of information, calling for the ability to merge indexes and to create user-defined views of information. In this situation, old-fashioned indexing techniques are clearly inadequate.
The problem has been recognized for several decades in the realm of document processing, but the methodology used to address it - full text indexing - has only solved part of the problem, as anyone who has used search engines on the Internet knows only too well. Mechanical indexing cannot cope with the fact that the same subject may be referred to by multiple names ("synonyms"), nor that the same name may refer to multiple subjects ("homonyms"). Yet, this is basically how a web search engine works, so it is no surprise when you get thousands of irrelevant hits and still manage to miss the thing you are looking for!
Topic maps provide an approach that marries the best of several worlds, including those of traditional indexing, library science and knowledge representation, with advanced techniques of linking and addressing. The author has realized the need for a map to the XML topic; a later installment will deal with XTM and an attempt to apply it to this Web site.
Produced by Michael Claßen
Created: Mar 17, 2003
Revised: Mar 17, 2003