My THELoginRegister
Third Level Navigation:
09 February 2010

Advertisement

Advertisement

Advertisement

-
Main Page Content:

Dickens' tale circa 1135 and other massive errors

Major errors prompt questions over Google Book Search's scholarly value

10 September 2009

Google Book Search's mistakes provoke questions about its scholarly value. Matthew Reisz reports

It should be the world's greatest scholarly resource, but some claim that Google Book Search's many huge - and often hilarious - errors raise major questions about its value to serious researchers.

Why does a link to a book on cosmology by a Napoleonic mathematician lead to a novel by Barbara Taylor Bradford? Could Sigmund Freud really be one of the authors of The Mosaic Navigator: The essential guide to the Internet Interface? And how did Barack Obama publish 29 books before he was born?

The journal Speculum is about the Middle Ages rather than gynaecological instruments, so why is it listed under "Health & Fitness"? And why on earth is a French translation of Hamlet classified under "Antiques & Collectibles"?

Even stranger, there seems to be something special about the year 1899, with Google claiming that a novel by Stephen King, a biography of Bob Dylan, a Portuguese version of the Beatles' film Yellow Submarine - and dozens of almost equally implausible titles - were all published then.

Such grotesque mistakes were pointed out by the linguist Geoffrey Nunberg, adjunct full professor at the University of California at Berkeley's School of Information, at its recent conference, "The Google Book Settlement and the Future of Information Access".

Mark Liberman, trustee professor of phonetics at the University of Pennsylvania, made a similar case. A self-proclaimed "enthusiast" for Google Books, he knew it would revolutionise his own discipline - the history of the English language - by hugely increasing the amount of textual material easily available for analysis, "with a potential effect comparable to the invention of the telescope or the microscope".

It remained crucial for scholars, however, that "basic bibliographic information - who wrote what, when - is almost always correct", he said. He added that he was sceptical about how soon the errors would be sorted out. Since such information "may not matter much to ordinary search customers, there is little incentive for Google to fix it", he said.

Professor Nunberg was even more outspoken in a blog posted on 29 August. With Google likely to become "the universal library for a long time to come", scholars need good metadata. Unfortunately, Google's information is "a train wreck: a mish-mash wrapped in a muddle wrapped in a mess".

The posting led to a long reply by Jon Orwant, who has the unenviable task of "managing the Google Books metadata team".

He cheerfully admits to some additional errors, such as an edition of Charles Dickens' A Christmas Carol dated to 1135 - three centuries before Johannes Gutenberg introduced the printing press to Europe.

He is also frank about the scale of the glitches still to be ironed out: "Geoff refers to us having hundreds of thousands of errors. I wish it were so. We have millions ... When you're dealing with a trillion metadata fields, one-in-a-million errors happen a million times over."

The glut of books "published" in 1899 is explained by a Brazilian metadata provider, which strangely uses that year as a default setting when it doesn't know the true date.

Nonetheless, Google is struggling to put things right. "Geoff's efforts will have singlehandedly improved nearly one million metadata records in our repository," Dr Orwant says.

Researchers will be keeping a close eye on whether they manage to solve some pretty monumental teething problems.

matthew.reisz@tsleducation.com.

Readers' comments

  • logic says... 10 September, 2009

    so why not web 2.0 it - and allow people to 'correct' it. Wikipedia style. Provided all changes are recorded and can be rolled back they can be contested & (mostly) agreed. But then, what's the incentive? For google it would be that they would want you to have a google login first :-)

  • David 14 September, 2009

    It's great that news like this hits the mainstream. In publishing and in libraries, we've known about the weaknesses of Google and other search engines for many years. The hype, around the revolution Google and others have contributed to, does however hide the backroom skill set that is required to create reliable, searchable information in the real world for professional users. It is depressing to see some major academic institutions in London shedding professionally trained library and curatorial staff in the wake of this kind of hype. Ignorance of the strengths and weaknesses of these new digital tools and services should not be an acceptable excuse for cutting jobs. A digital future needs sound curatorial work in order to make it fit for purpose.

  • Keeping my head down 21 September, 2009

    Alas, it is not only in London that "some major academic institutions" are shedding professionally trained library staff. Some very strange things are happening in the library of at least one Russell Group university outside the capital, where staff have been made to play musical chairs for their jobs.

  • Millennium Tent Rotunda 1995 Portraits Exhibition 22 September, 2009

    FILM TIE_INs Exhibits: `Kind Hearts & Coronets` the seqel -------- `Witchfinder General The Movie - The Quiddithch Board - The Socerer`s Apprentices - `Richard 3 The Movie` - The draftsman`s Contract - Zoo and Conpany - Possession -The Piano - Reservoir Dogs - The Devil Wears Prada etcetera etcetera - Last Temptation of the Christ - Dogtxt `antichrist` - Booksales Remaindered: Daniel Defoe Journal of the Plague Year Roxanna

Comment on this story

Post your comment

You must fill in all fields marked *

10 September, 2009

 

Main site navigation:
Secondary site navigation:
Main site navigation end
-
 
-
Abacus E-media
Abacus e-Media
St. Andrews Court
St. Michaels Road
Portsmouth
PO1 2JH
-

Advertisement