The Google Books team and a group of researchers from Harvard published a paper in Science, Quantitative Analysis of Culture Using Millions of Digitized Books. The abstract:
We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Here is an over-the-top article about it from the Guardian; here is a more restrained article from PC Magazine. And here is something rather more interesting: a tool to see trends in word use over time in the Google Books corpus. Just put in a term or terms, and you can see how the frequency with which they’re used changes over the years.
UPDATE: The New York Times has a well-written story on the paper and tool.
UPDATE: Geoffrey Nunberg has a typically long and thoughtful essay on the paper and tool in the Chronicle of Higher Education.