GBS: Some Highlights of Dan Clancy’s Declaration

From the declaration of Daniel Clancy in support of the motion for approval:

Paragraph 5:

To date, Google has Digitized over twelve million books, and intends to continue Digitizing books in the future.

Paragraph 6:

To date, Library-Registry Agreements have been signed by the University of Wisconsin, Stanford University, and the University of Virginia.

Paragraph 8:

It would be technologically burdensome to implement the exclusion of Inserts on a piecemeal basis, rather than from all Display Uses, because it would require the maintenance and tracking of numerous versions of a given Book, one for each Display Use, each containing only those Inserts which may be used in that Display Use. Such piecemeal exclusion would also be frustrating to users.

Paragraphs 9–11:

Google has received metadata from 48 libraries.

Google pays approximately $2.5 million per year to license metadata from 21 commercial databases of information about books.

Google has gathered 3.27 billion records about Books, and analyzed them to identify more than 174 million unique works.

Paragraph 22:

Because of the unstructured nature of most data available on the web, it would have been infeasible to attempt to use the Google search engine to generate a list of class members to whom notice was to be sent, and such an attempt would be error-prone. Similarly, because of “optical character recognition” errors and the unstructured nature of the data, it would have been infeasible and error-prone to attempt to derive class member contact information from Google’s scans of individual books.

I wonder why he only mentioned the original representative plaintiffs. Didn’t anyone tell him some more were added in the third complaint?

I’m delighted to know that Google “have developed algorithms to compare these numerous sources of metadata and identify the most accurate data about each book”. (paragraph 12). I’ll be even happier when they’ve made them work properly so there aren’t any more duplicate entries, books without ISBNs, books incorrectly classified as Not Commercially Available and various editions of the same book not linked together.


Correcting the database would take human intervention, and Google is doing its best to do everything automatically. Also as fast as possible. I don’t really expect any improvement.

Oh yes, the metadata. From what I’ve seen I guess it consists largely of:

Cataloging-in-publication data, which the publisher either has done free by the Library of Congress or pays a cataloger for

Data from used booksellers—what I’ve seen for my books is almost all drawn from my own press releases and back cover copy, often with cuts but seldom with rewriting

Possibly Books in Print and/or book wholesaler data, also supplied by the publisher

I’d really question whether much of this metadata was created or paid for by the companies selling it.

There’s also some from who-knows-where. For example, I dedicated my most recently published book to my parents, and mentioned them in the acknowledgements. Somehow, their names were entered as the authors in some library metadata I’ve seen on WorldCat. I do not know who created the ur-mistake and have no means of correcting it.