ALO ALTO, Calif., May 10 — Putting the world's
most advanced scholarly and scientific knowledge on the Internet has
been a long-held ambition for Michael Keller, head librarian at
Stanford University. But achieving this goal means digitizing the
texts of millions of books, journals and magazines — a slow process
that involves turning each page, flattening it and scanning the
words into a computer database.
Mr. Keller, however, has recently added a tool to his crusade. On
a recent afternoon, he unlocked an unmarked door in the basement of
the Stanford library to demonstrate the newest agent in the march
toward digitization. Inside the room a Swiss-designed robot about
the size of a sport utility vehicle was rapidly turning the pages of
an old book and scanning the text. The machine can turn the pages of
both small and large books as well as bound newspaper volumes and
scan at speeds of more than 1,000 pages an hour.
Occasionally the robot will stumble, turning more than a single
page. When that happens, the machine will pause briefly and send out
a puff of compressed air to separate the sticking pages.
For Mr. Keller, the robot, made by 4DigitalBooks, one of two
companies now introducing the first automated digitization systems,
is a boon.
"Think about the power of bringing our library to little schools
in the middle of Africa," Mr. Keller said. "Would it make a
difference for those who now have their minds closed to the idea of
democracy?"
The first book-scanning robots were introduced this spring by
4DigitalBooks of St. Aubin, Switzerland, and Kirtas Technologies of
Victor, N.Y. The machines have already begun to generate interest
from libraries and private and nonprofit groups now working to
digitize books.
Until now, the job has been done mostly by students or armies of
low-cost workers in countries like India and the Philippines. But
manual digitization presents significant logistical problems. Book
collections may have to be moved long distances to digitization
centers.
And in some cases the process of scanning has damaged old books
and journals, making it necessary to rebind them afterward.
The digitizing machines, by contrast, can be located close to
book collections and offer speed and quality control unattainable by
manual systems.
Even so, manual processing is still less expensive in many cases
than acquiring a robot. The 4DigitalBooks robot, whose price neither
the company nor Stanford officials would disclose, becomes cost
effective on projects larger than 5.5 million pages, said Ivo
Iossiger, the company's chief technology officer and a co-founder.
It seems likely that the vast majority of digitization over the next
several years will be done by hand.
Mr. Keller admits that his dream to have the entire Stanford
library in a digital database is unlikely in the foreseeable future
because such an undertaking — involving eight million volumes —
could cost upward of $250 million.
In the meantime, the Stanford librarians have begun digitizing
books and documents where there are no thorny copyright barriers and
have important historical and political significance.
The newly installed robot is currently finishing two pilot
projects, scanning books published by Stanford's Center for the
Study of Language and Information and works for the Medieval and
Modern Thought Text Digitization Project. It will soon begin work on
the 2,500 titles published by the Stanford University Press.
Not long ago Stanford helped finance the manual digitization of
the presidential papers of Eduardo Frey, the former president of
Chile, who was concerned that records of his administration could be
lost in a coup.
And beginning in 1999, the Stanford library system sent a team of
specialists and students to Europe, where the university is engaged
in a multiyear project to digitize selected documents produced by
the General Agreement on Tariffs and Trade and its successor
organization, the World Trade Organization in Geneva. The project,
which will take five years, will ultimately scan about 2.2 million
pages of information.
Other ambitious undertakings like Carnegie Mellon University's
Million Book Project will also continue to rely on manual
digitization for several more years. Another project, led by the
Internet Archive in San Francisco, recently shipped 80 tons of old
books acquired from the Kansas City Library to Hyderabad, India,
where they will be scanned, according to Michael Lesk, a former
National Science Foundation official and digital library expert who
works with the archive.
Mr. Lesk said that currently in India or the Philippines it is
possible to scan and digitize a book for $1 to $4. But he
acknowledged that there were significant costs in quality control.
For Mr. Keller the most vexing challenges are neither labor costs
nor technology. Librarians, he said, must find a way to address the
copyright restrictions that appear to be tightening as a result of
new federal laws like the Digital Millennium Copyright Act of
1998.
Stanford is struggling to comply with copyright restrictions
while making works that have recently lost their copyright
protection available digitally. Mr. Keller said the library
increased the circulation of its collection by 50 percent when it
computerized its card catalog. Digitizing out-of-print books could
likewise make them available to a much wider audience, he said. The
payoff for building such a digital collection, he added, is vastly
improved availability of a huge store of knowledge and information
for teaching, learning and research.