Building a Real Virtual Library
If politicians, the press, and the public are satisfied with a virtual
library that offers less real content than three or four books out of
a bricks-and-mortar library, they may be willing to invest huge sums of money on
projects that take us only a very short way towards the goal of having
every book in the Library of Congress online.
Only if all of us understand what we're really getting out of each new
virtual library can we make informed decisions as to which projects
merit further funding, and which ones are less effective uses of money.
I am not privy to the budget of firstladies.org, and it is not my place
to say whether they are getting the appropriate bang for the buck on
their sponsors' investments. My point is more general: anyone who
is sponsoring, supporting, or using a given virtual library should
judge it by realistic standards.
To be fair, firstladies.org probably isn't responsible for the confusion in that
AP article. Their press materials make clear that the central feature of
their site is a bibliography. I asked their spokesperson, Kim Turpin
Davis, for a clarification of the goals of the library.
The founding chair
and president of the National First Ladies' Library, Mary Regula,
described their efforts to date as "the first step in creating
the ÂLibrary of the Future,' a resource for a wealth of information on
America's first ladies which is accessible both by visiting the Library
in person or virtually through the Internet."
She continued, "Future plans include making the full texts of these works
available on line through its Web site."
With all due respect to director Regula, to
the numerous sponsors of firstladies.org, and to Hillary Clinton,
they may be underestimating the enormity of the task before them.
There are significant hurdles to bringing print materials online:
- Copyright clearance: For materials published within the last 50 years or so,
one cannot simply scan in the pages and put the material online; it's essential to
obtain copyright clearance. Let's assume half of the documents listed in the
Anthony bibliography have lapsed into the public domain, eliminating the need
to obtain copyright permission. Obtaining clearance for the remaining 20,000
items produced by hundreds or thousands of separate publishers will be
- Scanning materials into digital form: Anyone who has undertaken a serious digitization effort knows how much work this entails. Just a few of the challenges include: obtaining all materials on-site for scanning; getting optical character recognition to work well; making separate pages physically accessible to a scanner (especially difficult with one-of-a-kind historical documents); and setting up an efficient workflow to manage the whole process.
- Markup for online presentation: You can't simply scan a pile of paper and make the text available in raw form. For it to be useful, a process of markup, analogous to HTML markup, is essential. This is time consuming, labor-intensive work.
So with all these hurdles, is the idea of transforming firstladies.org into a
true content-rich virtual library impractical? Not at all. I have toured two of the
leading organizations engaged in this sort of work:
UMI (originally University Microfilms)
and the Electronic Text Center
at the University of Virginia.
UMI licenses full-text databases of leading business journals to thousands
of institutional customers worldwide. The e-text center at Virgnia is a leading
converter of historical texts into digital form. Both organizations must
digitize huge quantities of print materials, perform optical character recognition
on each word of each page of each document, and go through a labor-intensive
process of cleanup before materials are put online. Both UMI and the Virginia
E-text center have implemented policies, procedures, and workflows to overcome
all the inherent barriers involved.
Thus, it's not impossible to build a real virtual library with substantial
full text content -- but it does require a serious commitment to
perform the required heavy lifting. Here are some suggestions as to
concrete first steps in building a virtual library with real
- Partner with the Virginia e-text center, or another member of the Text Encoding Initiative, so that you can use existing, proven methodology in digitizing materials and delivering them over the Net.
- Start small. Pick a collection of items out of the bibliography that are housed in one location. Work with that institution to scan and mark up the material.
- Try to establish partners with existing major newspapers. Papers such as the New York Times and the Chicago Tribune have a wealth of material available in electronic form. News databases such as Lexis-Nexis are another source of possible partners. If a major newspaper or newspaper database would make all material they have relating to first ladies accessible online, you'd have a treasure trove to offer your virtual patrons.
- For multimedia content, partner with major radio and television networks. Audio recordings of public figures date back to the turn of the century, and archives of old radio broadcasts date back almost as far. Television networks have archives of news and entertainment programming dating to the beginning of that medium. Since Microsoft is one of your funding partners, MSNBC might help provide video clips of 20th century first ladies Â perhaps in digital form, perhaps even mounting the digital video on streaming Web servers for your immediate leveraging.
Regardless of what does or does not become of firstladies.org, it is time for all of us to raise the bar, redefining what we expect of prominent new Web sites and virtual libraries:
- Reporters should read press releases carefully and ask the publisher hard
questions: "What sort of actual content are you mounting? How does this help
the proverbial schoolgirl in rural Tennessee?" Reportage that parrots
(or misquotes) press releases will no longer suffice; reporters need to do some
study, and understand the actual content a new Web site does or does not delivere.
- Politicians should ask the same questions before they lend their prestige
to new virtual libraries. Before you preside over a virtual ribbon cutting,
lavishing fulsome praise in a nationally-covered ceremony, find out
the real significance of the virtual road project in question.
- Corporate and government funders of digital libraries should demand
substantial, serious full-text content in the virtual libraries they finance.
Specific goals as to the nature and quantity of content should be included in
the project proposal, and success should be measured based on completion of those
- The general public should read news articles announcing the next
important virtual library with a skeptical eye. Until the news media learns to
distinguish a simple Web site from a substantial new virtual library, don't believe
every story they trumpet. We're a long, long way off from having every book in
the Library of Congress, or every book in your local public library Â or 40,000
books and articles about first ladies Â available online.