The Official Laboratorium Jellyfish

The Laboratorium has been brought to you since 2000 by James Grimmelmann. Here's some information about the site and here's my disclosure statement.

Recent Comments

Ayelet Oz, on GBS: Google Books in Israel?, “As I see it, http://books.google.co.il/ is the Hebrew version of Google Books.…”

john walker, on GBS: Macbeth without the Prince, “James A better story title for GBS , could be “The Battle…”

Douglas Fevens, on GBS: Google Editions to Launch This Summer, “The article gives a launch date as “late June or July”. Perhaps…”

Peter G, on Epic Fail, “I fail to discern your point, James.…”

john walker, on GBS: First Digital Humanities Grants Announced, “The biggest selling book, reprinted several times, written by Edgar Allan Poe,…”

john walker, on GBS: Final Version of Samuelson's Future of Books in Cyberspace, “Pamela Samuelson is a clear contender for this years ‘Oliver Sacks trophy’.…”

john walker, on GBS: First Digital Humanities Grants Announced, “The projects sound riveting. I look forward to Titles like- “The songs…”

john walker, on GBS: Final Version of Samuelson's Future of Books in Cyberspace, “Having had a chance to properly read this excellent history( i.e. get…”

Douglas Fevens, on GBS: Open Book Alliance Writes to Congress, “Regarding the comment above: “This newspaper article and my comment to it…”

Frances Grimble, on GBS: First Digital Humanities Grants Announced, “So, how to find out whether copyrighted books are being used for…”

Archives

2010
Jan  Feb  Mar  Apr  May  Jun  Jul 
2009
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2008
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2007
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2006
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2005
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2004
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2003
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2002
Jan  Feb  Mar  Apr  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2001
Jan  Feb  Mar  Apr  May  Jun  Jul  Sep  Oct  Nov  Dec 
2000
Jan  Mar  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
1999
Jan  Feb  Mar  Sep 
1998
Jan  May  Jun  Sep  Nov 
1997
Sep 
1995
Nov 
1993
Oct 
1992
Oct 


Old Sideblog Archive


Pondering Potter Archive

Brian Lavoie and Lorcan Dempsey, Beyond 1923: Characteristics of Potentially In-copyright Print Books in Library Collections, D-Lib Magazine, November/December 2009, tries to give some tentative answers about the shape of the elephant. From the introduction:

The analysis that follows examines the characteristics of US-published print books, with an emphasis on books that are likely in copyright according to US copyright law. As with our earlier article, the analysis is based on data from the WorldCat database, which represents the aggregated collections of more than 70,000 libraries worldwide. The analysis focuses on three areas: the WorldCat aggregate collection of US-published print books; the subset of this collection published during or after 1923 - i.e., those potentially associated with copyright and/or orphan works issues; and the combined print book collection of three academic research library participants in Google Books - again, with an emphasis on materials that are potentially in copyright.

Lots of detailed tables on dates, authors, genres, audience level follow. From the conclusion:

This article characterizes the aggregate collection of US-published print books in WorldCat, with a special emphasis on materials published during or after 1923, and therefore either potentially or definitely in copyright. Findings from the analysis indicate that the collection of US-published print books in WorldCat is quite large, encompassing about 15.5 million print books. Nearly two-thirds of these - those published after 1963 - have a high likelihood of being in copyright; less than 15 percent - those published prior to 1923 - are almost certainly in the public domain, with the rest - those published between 1923 and 1963 - potentially in copyright if copyright was renewed. The post-1923 materials collectively account for more than 80 percent, or about 12.6 million, of the US-published print books in WorldCat. It is difficult to predict how many of these print books might be orphan works, but even a small fraction would, in terms of absolute numbers, be considerable, and require a substantial effort to investigate and clear copyright. One study, based on an examination of a random sample of books, estimates a cost of approximately $200 for each title for which digitization and access permissions were obtained.

(Via ResouceShelf.

The definition of a book in GBS 1.0 & 2.0 “a written or printed work … published or distributed … or made available for public access as a set of written or printed sheets of paper” goes far wider than conventionally published books. It includes dissertations, reports, monographs and - apart from the items specifically excluded - just about anything on paper that has been deposited in a library.

Also, while WorldCat may be able to identify the country of publication, the Settlement “find & claim” algorithm cannot. There are 11 “books” under my name in the GBS database (a thesis, a monograph, a pamphlet, a festival program & 7 books published by New Zealand publishers. 4 of the books are commercially available, 3 of the 4 have been “digitized without authorization”. None of the books appear in the search results when I enter my name as author, & type “New Zealand”, “NZ” or “N.Z.”into the imprint box of the search engine. All I get back is the pamphlet I wrote for the Heart Foundation of New Zealand.

Hint: if you haven’t already got an account on the GBS claim site, get one. You can use it to search the database not only for your own books but for all the “books” listed in the names of all the authors and publishers you wish to enter. Most of “books” “digitized without authorization” and recognised by GBS as published in NZ are monographs authored and published by organisations (New Zealand Romney Marsh Sheep Breeders Association, New Zealand Oceanographic Institute etc). I suspect the authors of these works have no idea that the GBS concerns them.

Lynley Hood said: “Hint: if you haven’t already got an account on the GBS claim site, get one.” Sorry Lynley, but I refuse to go to a Google site. To me it seeks to legitimize what I feel is Google & Company’s illegal activity regarding the digitization of my book. I do not what to send the message that I condoned their digitization of my work. I have even deleted Google from my web browser’s tool bar because I saw no reason why I should have to stare at their brand the whole time I’m on the net. Douglas Fevens, Halifax, Nova Scotia The University of Wisconsin, Google, & Me

It’s important to clarify what the numbers in the Dempsey/Lavoie article represent. Each “book” that is counted represents a published product at about the same level of granularity that today would be given an ISBN. Therefore if a publisher re-issues a book in their backlist after the previous print run has been exhausted (say, a decade later) and with a new introduction, it is considered a different book. The publication date that is fed into the study is the date of the new issuing of the book. Also, as publishers re-package and re-print public domain books, these also are considered separate products with new ISBNs and new dates. Thus, if you look up a commonly re-published book like “Moby Dick, Or The Whale” in the Library of Congress catalog, you retrieve 40 items (and more if you use the short form of the name, simply “Moby Dick”), of which only one is pre-1923 — that one was published in 1851. Of the other thirty-nine instances of the publication of the work, which range from 1925 to 2006, some contain what GBS called “inserts” - that is, separately copyrightable intellectual property in the form of introductions, etc., but others may be a straight republication of the text.

What Google lacks the ability to do (yet?) is to make the proper connection between the original text that is in the public domain and the many “manifestations” (as they are called in library-speak) that were published later — and are also in the public domain, at least as far as the primary text is concerned. This is a non-trivial exercise when one is working only with the metadata that describes the work, but may become more feasible with the ability to do a full text analysis of the contents of the various packages in which publishers have placed the original work of Melville. I assume that Google is working on this, although I cannot predict how it will affect their assessment of the PD/(c) split.

Post a comment



You can use HTML style tags or Markdown.

Comment Preview: