GBS: Google Giving Grants to Study Digitized Book Corpus

Marc Parry, Google Starts Grant Program for Studies of Its Digitized Books, Chronicle of Higher Education (Mar. 31, 2010):

The company is creating a “collaborative research program to explore the digital humanities using the Google Books corpus,” according to a call for proposals obtained by The Chronicle. Some of Google’s academic partners say the grant program marks the company’s first formal foray into supporting humanities text-mining research.

The call went out to a select group of scholars, offering up to $50,000 for one year. Google says it may choose to renew the grants for a second year. It is not clear whether anybody can apply for the money, or just the group that got the solicitation. …

Literature is one of eight “disciplines of interest” that Google has identified for its program. The others are linguistics, history, classics, philosophy, sociology, archaeology, and anthropology.

The effort seems largely focused on building tools to comb and improve Google’s digital library, whose book-search metadata—dates and other search-assisting information—one academic researcher calls a “train wreck.” These are some of the sample projects that Google lists in its call for proposals:

  • Building software for tracking changes in language over time.
  • Creating utilities to discover books and passages f interest to a particular discipline.
  • Developing systems for crowd-sourced corrections to book data and metadata.
  • The testing of a literary or historical hypothesis through innovative analysis of a book.

I’m reminded of two potential sources of research funding from my misspent youth in chemistry:

The American Tobacco Institute The American Petroleum Institute

And given Google’s behavior in initiating the GLP, I’m not so sure the comparison is completely inapt…

The Chronicle of Higher Education has added another story you might be interested in: Study Finds Copyright Concerns Affect Communications Research

Douglas Fevens, Halifax, Nova Scotia— The University of Wisconsin, Google, & Me

The Google Book data base is a market asset for Google equal to an oil company owning Saudi Arabia. Controlling the data base and its future exploitation is a potential bonanza for Google,and as Jaron Lanier implies, having a hand in the structure of its public access, by controlling exclusivity to research projects like these on a proprietory basis, strengthens Google’s iron fist over the scanned books and will lock in Google’s particular access and search technologies for decades to come . Not only does this raise anti trust questions, but also issues of academic freedom of the publicly owned(i.e. Michigan ,Iowa, California, Wisconsin etc, not Duke or Stanford) scanned university libraries whose libraries should be unfettered for public use. Academics seeking these Google grants and their universities should carefully screen the Google Grant terms to ensure that Google does not require exclusivity and a power to quash the products of these funded research “independent” research projects. Otherwise, this is just more fuel for the fires of Google’s competitors to bring before Judge Chin in the pending case.

My guess is that this project will in fact, include giving researchers access to many copyrighted books. As a researcher in history, I find that copyrighted books are often secondary sources, whereas public domain books are often primary sources. Researchers use both heavily, but in different ways. Google will probably want to track those ways.

Is anyone investigating, or planning to investigate, whether this project will include access to copyrighted books without the permission of the copyright holders?