GBS: Fantastic Comment by Google’s Jon Orwant on Geoff Nunberg’s Metadata Criticism


Read the whole thing (along with Geoff’s original), but the short version is that Google is making a lot of already-existing mistakes in the metadata more visible.


I draw the following inference from Orwant’s bluster: they are not using the best sources of metadata (probably to save money). Other online catalogues have their errors, but they are not nearly such a mess as Google Books.

A more important point is this:

If Google’s database contains such huge errors in metadata (and it does), then how can it remotely be trusted to contain accurate information on the question of whether a given book is ‘commercially available’ in the US - a question that the Settlement Agreement proposes to treat as a crucial determinant of the default uses to be made of the books that Google has scanned?

The answer to that, of course, is that it can’t: as witness the statements filed with the court yesterday by publishers from South Africa, Sweden and Germany.


Are people who are opposed to the Google Books Settlement on metadata-type grounds simply opposed to the idea of any universal library? Because frankly I don’t know how you would set one up without lots and lots of errors.


Is Mr. Orwant’s diatribe the world’s longest shaggy dog story? Couldn’t he have just said, “the dog ate my homework”?


Who’s blustering? Google does in fact use what is probably the best available source of book metadata: see OCLC’s press release.

Having worked on large metadata projects myself, I find that Jon Orwant’s explanations ring true.


Orwant doesn’t mention OCLC; he cites various metadata sources, but not that.

Over many years of literary and historical research, I have used a very large number of online catalogues and electronic library catalogues. Many, probably all of them throw up occasional errors, but not on anything remotely like the scale of Google Books.