Internet Outlook with Richard Wiggins | 41


Volume 1, Number 18 March 4, 1998

A Virtual Library with Empty Shelves


Building a Real Virtual Library

If politicians, the press, and the public are satisfied with a virtual library that offers less real content than three or four books out of a bricks-and-mortar library, they may be willing to invest huge sums of money on projects that take us only a very short way towards the goal of having every book in the Library of Congress online.

Only if all of us understand what we're really getting out of each new virtual library can we make informed decisions as to which projects merit further funding, and which ones are less effective uses of money.

I am not privy to the budget of firstladies.org, and it is not my place to say whether they are getting the appropriate bang for the buck on their sponsors' investments. My point is more general: anyone who is sponsoring, supporting, or using a given virtual library should judge it by realistic standards.

To be fair, firstladies.org probably isn't responsible for the confusion in that AP article. Their press materials make clear that the central feature of their site is a bibliography. I asked their spokesperson, Kim Turpin Davis, for a clarification of the goals of the library. The founding chair and president of the National First Ladies' Library, Mary Regula, described their efforts to date as "the first step in creating the ‘Library of the Future,' a resource for a wealth of information on America's first ladies which is accessible both by visiting the Library in person or virtually through the Internet." She continued, "Future plans include making the full texts of these works available on line through its Web site."

With all due respect to director Regula, to the numerous sponsors of firstladies.org, and to Hillary Clinton, they may be underestimating the enormity of the task before them. There are significant hurdles to bringing print materials online:

  • Copyright clearance:
  • For materials published within the last 50 years or so, one cannot simply scan in the pages and put the material online; it's essential to obtain copyright clearance. Let's assume half of the documents listed in the Anthony bibliography have lapsed into the public domain, eliminating the need to obtain copyright permission. Obtaining clearance for the remaining 20,000 items produced by hundreds or thousands of separate publishers will be daunting indeed.
  • Scanning materials into digital form:
  • Anyone who has undertaken a serious digitization effort knows how much work this entails. Just a few of the challenges include: obtaining all materials on-site for scanning; getting optical character recognition to work well; making separate pages physically accessible to a scanner (especially difficult with one-of-a-kind historical documents); and setting up an efficient workflow to manage the whole process.
  • Markup for online presentation:
  • You can't simply scan a pile of paper and make the text available in raw form. For it to be useful, a process of markup, analogous to HTML markup, is essential. This is time consuming, labor-intensive work.

So with all these hurdles, is the idea of transforming firstladies.org into a true content-rich virtual library impractical? Not at all. I have toured two of the leading organizations engaged in this sort of work: UMI (originally University Microfilms) and the Electronic Text Center at the University of Virginia. UMI licenses full-text databases of leading business journals to thousands of institutional customers worldwide. The e-text center at Virgnia is a leading converter of historical texts into digital form. Both organizations must digitize huge quantities of print materials, perform optical character recognition on each word of each page of each document, and go through a labor-intensive process of cleanup before materials are put online. Both UMI and the Virginia E-text center have implemented policies, procedures, and workflows to overcome all the inherent barriers involved.

Thus, it's not impossible to build a real virtual library with substantial full text content -- but it does require a serious commitment to perform the required heavy lifting. Here are some suggestions as to concrete first steps in building a virtual library with real primary content:

  • Partner with the Virginia e-text center, or another member of the Text Encoding Initiative, so that you can use existing, proven methodology in digitizing materials and delivering them over the Net.
  • Start small. Pick a collection of items out of the bibliography that are housed in one location. Work with that institution to scan and mark up the material.
  • Try to establish partners with existing major newspapers. Papers such as the New York Times and the Chicago Tribune have a wealth of material available in electronic form. News databases such as Lexis-Nexis are another source of possible partners. If a major newspaper or newspaper database would make all material they have relating to first ladies accessible online, you'd have a treasure trove to offer your virtual patrons.
  • For multimedia content, partner with major radio and television networks. Audio recordings of public figures date back to the turn of the century, and archives of old radio broadcasts date back almost as far. Television networks have archives of news and entertainment programming dating to the beginning of that medium. Since Microsoft is one of your funding partners, MSNBC might help provide video clips of 20th century first ladies – perhaps in digital form, perhaps even mounting the digital video on streaming Web servers for your immediate leveraging.

Regardless of what does or does not become of firstladies.org, it is time for all of us to raise the bar, redefining what we expect of prominent new Web sites and virtual libraries:

  • Reporters
  • should read press releases carefully and ask the publisher hard questions: "What sort of actual content are you mounting? How does this help the proverbial schoolgirl in rural Tennessee?" Reportage that parrots (or misquotes) press releases will no longer suffice; reporters need to do some study, and understand the actual content a new Web site does or does not delivere.
  • Politicians
  • should ask the same questions before they lend their prestige to new virtual libraries. Before you preside over a virtual ribbon cutting, lavishing fulsome praise in a nationally-covered ceremony, find out the real significance of the virtual road project in question.
  • Corporate and government funders of digital libraries
  • should demand substantial, serious full-text content in the virtual libraries they finance. Specific goals as to the nature and quantity of content should be included in the project proposal, and success should be measured based on completion of those goals.
  • The general public
  • should read news articles announcing the next important virtual library with a skeptical eye. Until the news media learns to distinguish a simple Web site from a substantial new virtual library, don't believe every story they trumpet. We're a long, long way off from having every book in the Library of Congress, or every book in your local public library – or 40,000 books and articles about first ladies – available online.


Comments are welcome

http://www.internet.com

Produced by Rich Wiggins and
All Rights Reserved. Legal Notices.
Created: March 4, 1998
Revised: March 4, 1998

URL: http://webreference.com/outlook/column18/index.html