Explorer's Guide to the Semantic Web
1.1.1 Indexing and retrieving information
Everyone wrestles with how to find information. Libraries have card catalogs,
and now many have electronic indexes. Search engines are vital components of the
Web. Yet at some point, everyone has been frustrated and annoyed by how
hard it is to locate things, especially when you aren't sure what to ask
for. To find information, a Semantic Web approach would expect to go beyond keyword
and alphabetical indexes to let users search by concepts and categories.
The Web part brings in a persistent theme, in which information is distributed—
spread throughout the Web—rather than concentrated in a few repositories.
Most systems that use concept identification to retrieve information maintain
their own concept hierarchies and attempt to identify those concepts in the
documents they index. Sometimes concepts in a document collection are identified
automatically, with varying success. To go further requires that documents be
able to declare their own vocabularies and sets of concepts and to identify
where they’re used.
1.1.2 Meta data
Card catalogs and electronic indexes contain data about the works that are cataloged
and indexed. Data about other data is often called meta data. For example,
the ISBN number and the author’s name are meta data about a novel. The datatypes
describing the data in a database also fall into the category of meta data. It’s
even possible to have meta meta data (a statement about the origin of a piece
of meta data could be considered to be meta data about meta data, or meta meta
data).
In one sense, meta data is still data; the distinction lies in the intended
use of the data and in the subject of the meta data. It’s meta data that
will be used for searches and for discovery of information. Annotation can also
be thought of as meta data.
1.1.3 Annotation
In the world of physical documents (such as books), people write margin notes
and comments, they underline and highlight passages, they staple new items to
reports, and they add thoughts and ideas to those of the original authors. Markup
languages like XML should, you’d think, be able to add such annotations;
but today it’s hard to do this in a simple way that lets other people share
your annotations and lets you move your annotations to other applications and
computers. Wiki-style web sites attempt to let many people comment on
and modify web pages, but this process covers only a little of what people would
like to do.
Because annotations should be shareable, and because the meaning of different
types of annotations should be widely understood, support for extensive annotation
capabilities is often seen as part of the Semantic Web.
1.1.4 A huge interoperable database
Today it’s common to get data from a database over the Web. These databases
are generally separate and not easily used as merged data sources, and a great
deal of data exists outside of databases. This part of the Semantic Web vision
sees ways to unify the description and retrieval of stored data, allowing much
of the Web to be considered part of a large virtual database.
Consider a sports researcher looking for baseball data. There are various
online baseball databases: The Major League Baseball web site is but one of
many. But if our researcher wants to find performance statistics for Stan Musial,
whose career lasted from the 1940s to the 1960s, she can’t get data for
the whole period in a mutually compatible format. At least for baseball statistics,
there is some common agreement on the definitions of the most important statistics,
so that a batting average is always computed the same way—this is more
than can be said for most separate collections of data.
If the Web functioned as an interoperable database, the researcher could get
the data from all the important sites, and the researcher’s software would
be able to either display all the data together or automatically combine data
from, say, the Major League Baseball site and the Baseball Almanac.
1.1.5 Machine retrieval of data
This part of the vision focuses on automatic acquisition of data. This means
that a piece of software, in pursuit of its assignment, determines what data it
needs and where and how to get it, and then goes out and gets the data. Using
the baseball example from the previous section, suppose our researcher has to
find the right web pages, load them, and then figure out a way to get the data
and organize it. This is hard to do and often takes a lot of time. Under the Semantic
Web, the data format and its manner of access would be described in a way that
would allow the researcher’s computer to get and use the data automatically.
1.1.6 Services
A service is a behavior that provides a benefit. Examples include making reservations,
arranging schedules, providing prices, placing orders, and so forth. Think of
ordering, say, a perishable item like flowers or food. Once you’ve selected
a product to buy, you have to make sure that its delivery will fit into your schedule.
The price, buying conditions, delivery options, and your schedule can all be thought
of as services that must be activated and coordinated. In the “Semantic
Web as web services” view, all these services would publish machine-readable
data that would allow a computer to do all the activation and coordination for
you.
1.1.7 Discovery
To use services, you (and especially your software) must be able to find them,
discover what they do, and learn how to invoke them. This is the realm of discovery
of services. The most obvious approach would be to create directories of services
with standard access methods. The services would be described in standard terms,
and information about how to access them and the available information would be
encoded in standard ways.
Consider an analogy with a physical library. Most libraries in the United
States use either the Dewey Decimal System or the Library of Congress method
to catalog their books. After using the card catalog or its electronic version,
a person becomes familiar with the classifications and learns how to find books
on the shelves. Here, the standard access methods are the familiar classification
system and the physical arrangement of books in the library.
A more advanced approach would be to send out discovery requests based on
the services required, and for candidate services to describe their capabilities
in such a way that the would-be user could deduce their capabilities and instigate
a conversation to find any missing or uncertain information. Returning to the
library example, this would be like getting an experienced research librarian
to tell you which reference books to look at and how to understand the information
in them.
1.1.8 Intelligent agents
An agent is someone or something that acts on your behalf. A software
agent would act in a somewhat autonomous way, communicating with other software
agents (which might be specialized) to discover services, products, or information
for you. For instance, one of those specialized agents might know how to purchase
airline tickets and make reservations. Another agent might perform the required
services, passing the results back to your own agent, which would notify you of
the outcome. It’s clear that a network of interacting agents would have
to be able to describe its goals using established vocabularies, to discover services
and information resources, and to use many of the capabilities described in the
previous sections.
1.2 Two Semantic Web scenarios
To give you a feel for the way these areas might interact and how the Semantic
Web could provide great value, here are two scenarios that were developed during
the workshop “Research Challenges and Perspectives of the Semantic Web.”1
Both scenarios illustrate what might be called personal services. Of course,
similar scenarios could be constructed for many other areas, such as business-to-business
transactions. Note that the language is taken directly from the report without
corrections for grammatical and spelling errors.
Scenario 1: A research assistant
During her stay at Honolulu, Clara ran into several interesting people with
whom she exchanged vCards. When time to rest came in the evening, she had a
look at her digital assistant summarizing the events of the day and recalling
the events to come (and especially her keynote talk of the next day). The assistant
popped up a note with a link to a vCard that reads: “This guy’s
profile seems to match the position advertisement that Bill put on our intranet.
Can I notify Bill’s assistant?”
Clara hit the “explain!” button. “I used his company directory
for finding his DAML2 enhanced vita:
he’s got the required skills as a statistician who led the data mining
group of the database department at Montana U. for the requirement of a researcher
who worked on machine learning.” Clara hit then the “evidence!”
button. The assistant started displaying “I checked his affiliation
with university of Montana, he is cited several times in their web pages:
reasonably trusted; I
checked his publication records from publishers’ DAML sources and asked
bill
assistant a rating of the journals: highly trusted. More details?”
Clara had enough and let her assistant inform Bill’s.
1 This workshop was organized by the European
Consortium in Informatics and Mathematics (ERCIM) for the European Union Future
Emergent Technology program (EU-FET) and the US National Science Foundation
(NSF). It was held in Sophia-Antipolis, France, in October 2001.
2 DARPA Agent Markup Language; see chapter
7.
Created: March 27, 2003
Revised: October 4, 2004
URL: http://webreference.com/internet/semantic/1