GBS: Peter Jasco on Google Scholar


Peter Jasco, writing in Library Journal, does for the metadata in Google Scholar what Geoffrey Nunberg did for Google Books:

False names are created from options on the seach menu, such as P Options (for Payment Options); from parts of the author affiliation (CA San Diego, C Ltd, M View for Mountain View); from Table of Contents pages on publishers’ web sites; and from section headings of articles (B Methods, D Definitions, G Assessment, H Variables, I Evaluation. (The initial varies depending on the section identifying letter or Roman numeral.)

The article is scathing on the quality of Google’s parsers, and argues that Google should be more reliant on the high-quality metadata now being made available by journal publishers.

The press and the public were so enamored of anything with the word Google in it that GS developers apparently believed they could create a parser to identify the metadata better than the human indexers at the publishers, repositories, and indexing/abstracting services who assigned metadata by listing author, title, journal name, publication year, and other metadata elements.

But note that this is the opposite of the problem with Google Books, where Jon Orwant’s response to Nunberg put much of the blame on bad metadata supplied to Google by outside cataloguers. My instant reaction is that the situation can’t be as black-and-white as Jasco claims; I’d like to know more about the data sources made available, the terms, and the history.


Geoffrey Nunberg blogged on this yesterday:

What makes this a serious problem is that many people regard the Google Scholar metadata as a reliable index of scholarly influence and reputation, particularly now that there are tools like the Google Scholar Citation Count gadget by Jan Feyereisl and the Publish or Perish software produced by Tarma Software, both of which take Google Scholar’s metadata at face value. True, the data provided by traditional abstracting and indexing services are far from perfect, but their errors are dwarfed by those of Google Scholar, Jacso says.