Explorer's Guide to the Semantic Web | 3

Explorer's Guide to the Semantic Web

Scenario 2: Negotiating a date

In these scenarios, you can see quite a few Semantic Web areas in operation at the same time. Software agents (the digital assistants) are discovering meta data and information and processing it. Logical reasoning is not only used to make inferences, it's also explained to the human user. Assessments of trust and reliability are deduced through networks of interacting information. We see the discovery of web services. It all seems so plausible and so useful.

3 Presumably the International Semantic Web Conference.

1.2.1 Can the Semantic Web work this way?

What needs to be developed, what needs to be in place, for the Semantic Web to work as envisioned in the previous scenarios? The keys are the widespread interchange of data and ways to mark, indicate, or describe what that data is, how it’s structured, how it can be retrieved, and what it means. Each of these areas is a large undertaking in itself. But the Semantic Web will be a sociological development, too. Companies must cooperate where they might normally compete; academic research must be translated into practical systems; individuals must discover how they can contribute; and issues of for-profit versus free, of closed versus open systems, and of trust need to be worked out.

The task is much bigger than the building of the original World Wide Web. At that time, few people realized how many new capabilities the Web would unleash. Today, some of the basic infrastructure is already in place. There are organizations like the World Wide Web Consortium (W3C; www.w3c.org), whose purpose includes developing and advancing standards of importance to the Internet as a whole, including the Semantic Web. So the task is bigger, but the starting point is more advanced.

Can the visions be realized? Opinions vary—mine is that many of them will come to pass (some are already beginning to operate) and make a real difference in the lives of people who use the Web.

1.3 The Semantic Web’s foundation

The World Wide Web has certain design features that make it different from earlier hyperlink experiments. These features will play an important role in the design of the Semantic Web. The Web is not the whole Internet, and it would be possible to develop many capabilities of the Semantic Web using other means besides the World Wide Web. But because the Web is so widespread, and because its basic operations are relatively simple, most of the technologies being contemplated for the Semantic Web are based on the current Web, sometimes with extensions. However, web services (chapter 8) and agents (chapter 9) may step outside the architecture of the current Web, as you’ll see.4

The Web is designed around resources, standardized addressing of those resources (Uniform Resource Locators and Uniform Resource Indicators), and a small, widely understood set of commands. It’s also designed to operate over very large and complex networks in a decentralized way. Let’s look at each of these design features.


4 I’m referring in part to the so-called REST (Representation State Transfer) architecture and the controversy over whether current SOAP-based web services that don’t use this model would be better suited to the Web if they did.

1.3.1 Resources

The Web addresses, retrieves, links to, and modifies resources. A resource is intended to represent any idea that can be referred to. Usually we think of these resources as being tangible packages of data (documents or pages), but the notion of a resource is more general in two ways. First, a resource can change over time and still be considered the same resource, addressed by the same Uniform Resource Identifier (URI). Thus, a series of drafts of a manuscript could be addressed by the same URI. Alternatively, a URI could denote one specific, unchanging version of the same document. The notion of resource is flexible enough to encompass both varying and fixed resources.

Strictly speaking, a resource itself is not retrieved, but only a representation of the resource. For some protocols, like File Transfer Protocol (FTP), the representation is normally a copy of a file. For others, like HTTP, the representation may or may not be a copy of a file. A resource can even be represented by different forms—a PDF file, an HTML page, a voice recording, and so on.

Second, and perhaps harder to grasp, a resource can be something that doesn’t yet exist, and that may never exist. A resource can be a concept or a reference to a real or fictitious person—something that can’t be addressed and transferred over a network, but that can be talked about, thought about. For the purposes of the Semantic Web, such a resource can be referred to or identified by a URI.5

1.3.2 Standardized addressing

All resources on the Web are referred to by URIs. The most familiar URIs are those that address resources that can be addressed and retrieved; these are called URLs, for Uniform Resource Locators. These URIs have a uniform structure that can refer to the use of other protocols besides HTTP (like FTP), and they are easy to type and copy. They can be inserted into hyperlinks so that any addressable information can be easily linked.

5 For example, RFC 1737, “Functional Requirements for Uniform Resource Names” (a subset of URIs), says, “The purpose or function of a URN is to provide a globally unique, persistent identifier used for recognition, for access to characteristics of the resource or for access to the resource itself.” (Emphasis added.)

1.3.3 Small set of commands

The HTTP protocol (the protocol used to send messages back and forth over the Web) uses a small set of commands. These commands are universally understood by web servers, clients (like browsers), and intermediate components like caches,
which can reduce network traffic by storing copies of documents that were previously sent. With this limited set of commands, there is no question about what is being requested of the server and network, and no visibility into how the server may choose to carry out the requests. This model doesn’t provide security or personal privacy for the information being sent or requested; but, since it is simple and well understood, the model lends itself to the provision of additional layers of security.6

However, some architectures use complex messages or need to restrict the visibility of message contents, and they use an approach that’s more involved than basic HTTP. Other Internet protocols can be used, and additional messaging layers can be carried over HTTP as well (such as SOAP, whose name no longer stands for anything). There is some controversy over what methods should be used for the Web—as distinct from the Internet, which includes much more than the World Wide Web—and whether the Semantic Web architecture should restrict itself to the simpler architecture of the current Web.

1.3.4 Scalability and large networks

The Web has to operate over a very large network with an enormous number of web sites and to continue to work as the network’s size increases. It accomplishes this thanks to two main design features. First, the Web is decentralized. If you have a computer on the network, you can put a web server on it; and if you have a server, you can add resources to it without registering them anywhere else.

Second, each transaction on the Web (that is, a request and the subsequent response) contains all the information needed to handle the request. No data needs to be stored by the server from one request to another. However, many practical uses of the Web do require that some data be saved for a period of time. If you reserve a ticket and then order it on another web page, the system must store your ticket reservation and be able to connect it to your request to purchase. Since any web transaction is separate from all others, it’s harder to arrange to maintain data across a connected series of transactions. Independent interactions make possible a large, decentralized system where responses can be cached to allow faster responses and reduce network traffic.

Data that maintains some history of transactions is sometimes called state, as in “the state of the system.” Web transactions are stateless.7

If there is a business need to store information across several interactions, the server must provide special arrangements to make it happen.

6 There is some controversy over whether the web model supports security provisions better than othernetwork architectures, such as Remote Procedure Call (RPC) systems.

7 When a cookie is stored on your computer, the cookie stores some state information. Unfortunately, this state doesn’t fit the web model well, so it can sometimes cause confusion between browser, server, and user.

Created: March 27, 2003
Revised: October 4, 2004

URL: http://webreference.com/internet/semantic/1