The Laboratorium : GBS: Fantastic Comment by Google's Jon Orwant on Geoff Nunberg's Metadata Criticism

Read the whole thing (along with Geoff’s original), but the short version is that Google is making a lot of already-existing mistakes in the metadata more visible.

September 1, 2009 at 6:59 PM

Gillian Spraggs

I draw the following inference from Orwant’s bluster: they are not using the best sources of metadata (probably to save money). Other online catalogues have their errors, but they are not nearly such a mess as Google Books.

A more important point is this:

If Google’s database contains such huge errors in metadata (and it does), then how can it remotely be trusted to contain accurate information on the question of whether a given book is ‘commercially available’ in the US - a question that the Settlement Agreement proposes to treat as a crucial determinant of the default uses to be made of the books that Google has scanned?

The answer to that, of course, is that it can’t: as witness the statements filed with the court yesterday by publishers from South Africa, Sweden and Germany.

September 1, 2009 at 8:22 PM

Steven

Are people who are opposed to the Google Books Settlement on metadata-type grounds simply opposed to the idea of any universal library? Because frankly I don’t know how you would set one up without lots and lots of errors.

September 2, 2009 at 2:08 AM

Mark Robbins

Is Mr. Orwant’s diatribe the world’s longest shaggy dog story? Couldn’t he have just said, “the dog ate my homework”?

September 3, 2009 at 6:38 PM

Eric Hellman

Who’s blustering? Google does in fact use what is probably the best available source of book metadata: see OCLC’s press release.

Having worked on large metadata projects myself, I find that Jon Orwant’s explanations ring true.

September 4, 2009 at 11:34 PM

Gillian Spraggs

Orwant doesn’t mention OCLC; he cites various metadata sources, but not that.

Over many years of literary and historical research, I have used a very large number of online catalogues and electronic library catalogues. Many, probably all of them throw up occasional errors, but not on anything remotely like the scale of Google Books.