HathiTrust Summary Judgment Motions: Fair Use

The Public Index is mostly better now, thank you for asking. I hope to have the filings from the summary judgment motions posted relatively soon, but they’re not up yet.

The fair use issue is fairly joined. The Authors Guild and HathiTrust have presented sharply divergent accounts of the HathiTrust corpus. (The briefs are collected here.) In this post, I’ll focus on their briefs, which are the heart of the case. In a follow-up next week, I’ll deal with the National Federation of the Blind’s arguments for fair use, along with the amicus briefs filed yesterday by library associations and by a group of “digital humanities and law scholars.” Here are the highlights, factor by factor:

Factor Zero: Framing

The Authors Guild focuses on the “systematic” copying needed to make the HathiTrust corpus: the initial scans by Google and the replication of those scans at HathiTrust facilities. It cites cases like American Geophysical, MP3.com, and Encyclopedia Britannica v. Crooks in which wholesale copying by an institution was held not to be fair use, even if the individuals supplied by that institution might have had stronger fair use cases for non-systemic copying.

In contrast, HathiTrust focuses on the end uses to which the corpus is put: full-text search, preservation, and print-disabled access. These end uses themselves are unproblematically legal (according to HathiTrust), and so the copying to make the HathiTrust corpus enables only legal end uses. It cites cases like Sega v. Accolade, Sony v. Connectix, and A.V. v. iParadigms in which intermediate copying was held legal when undertaken solely to enable non-infringing end uses. The Authors Guild focuses on the corpus itself; HathiTrust focuses on its uses.

Factor One: Purpose of the Use

HathiTrust describes the corpus in terms of classically favored purposes: teaching, scholarship, and research. But the Authors Guild responds that while library patrons may engage in those purposes, the libraries themselves don’t. This response has to be right at some level: campus bookstores can’t just start photocopying textbooks on the grounds that students will make educational uses with them. And this is a significant part of the American Geophyiscal analogy, where a corporate library’s copying of articles for research scientists’ convenience was held to be unfair. But the point shouldn’t be pushed too far, as it depends on one of those crossovers between the first and fourth factors: the bookstore’s copying is unfair, in large part, because it substitutes for purchases of textbooks: the photocopy supersedes the purpose of the original textbooks. That’s much less clear here, where authors don’t make a digital corpus available and the HathiTrust corpus isn’t used to eliminate book purchases for patron access. But more on that further down.

Also part of the first-factor calculus is whether the use is commercial or noncommercial. The libraries are all non-profit entities and the HathiTrust corpus will be used for nonprofit educational uses. The Authors Guild tries to tar the project with the Google brush, emphasizing both Google’s commercial purposes and the value of the copies HathiTrust received in return for letting Google engage in the scanning. The former strikes me as irrelevant here (of course it is highly relevant in the Authors Guild’s lawsuit against Google). The latter is one of the persistent trouble spots in copyright caselaw: whether merely avoiding the expense of paying for something (leave aside for now whether it’s something available for purchase in the first pace) makes infringement “commercial.” Suffice it to say that there are cases going in both directions—driven more, I think, by other contextual factors than by the pure economics of the transaction.

Finally, there is the crucial first-factor question: is the use transformative? Under a traditional conception—as of, say, two decades ago—it isn’t. A HathiTrust copy is an exact reproduction of a book; its purpose is complete fidelity. The Authors Guild calls it “mechanical.” But HathiTrust draws on a more recent line of cases that have found transformativeness in a new place. Instead of transforming the work itself by imbuing it with new creativity; this new species of transformativeness changes the way the work is used, putting it to a use that recontextualizes it. Internet search engines have won two key cases holding that displaying thumbnails for purposes of image search is transformative. Those cases didn’t deal with the copying to make the index itself, but that’s where the intermediate-copying cases HathiTrust cites would come in. I expect that the follow-up rounds of briefing will deal with the question of how close the analogy to Internet image search is.

HathiTrust also points to print-disabled access and preservation as “transformative”: more on that next week.

Factor Two: Nature of the Work

The vast majority of the books in the HathiTrust corpus are published. If they were unpublished, that would tend to cut against fair use. But since they’re published, this part of the second factor doesn’t have much to say. HathiTrust emphasizes that most of the books in the corpus are out-of-print, and uses this point to explain to the court some of the difficulties and uncertainty that affect the copyright status of any significant corpus of old books. This is not a traditional second-factor argument; it will be interesting to see how the Authors Guild responds.

The sharper disagreement here concerns factual versus more creative works. The scope of fair use is broader for the first. There is no serious question that huge swaths of the books in the university collections that were scanned to make the HathiTrust corpus are factual monographs. There is also no serious question that mixed in with these are some more creative books—fiction and poetry from every era. According to HathiTrust, the ratio is about ten to one. The Authors Guild responds that given the indiscriminate shelf-clearing nature of the scanning, HathiTrust shouldn’t be able to claim the benefit of copying more informational works. That strikes me as an easily solvable problem if Judge Baer thinks the fair use case would turn on the second factor: he could rule that copying nonfiction is fair but copying fiction and poetry is unfair, and sort out the consequences at the remedial stage.

Factor Three: Amount Copied

In the most literal sense, HathiTrust has copied the whole of every book it has scanned. Repeatedly. But this factor can be squirrely. Where the use is transformative under the first factor, the copying under the third factor is judged not only absolutely, but also in relation to how much the defendant needed to copy for its transformative use. To make an index, you need to copy complete books en masse. So if you buy HathiTrust’s story on the first factor, you’ll buy its story on the third factor, too. If not, not.

Factor Four: Effect on the Market

The Authors Guild starts with its weakest argument: “Each digital copy … represents a lost sale to the book’s rightsholders.” This is true only under a strained definition of “sale,” because many of the books in the corpus are out of print and some are unavailable at any price. Indeed, few of the books are for “sale” for the full range of uses made by HathiTrust. A few sentences later, the Authors Guild acknowledges this, writing

To the extent a particular book was no longer in print or was unavailable for sale in digital format when Defendants sought to create a digital copy (though of course Defendants admit they never checked), they could have negotiated a license to do so, either separately with each author/publisher (as Google has done in its Google Books Partner Program) or collectively through a collective rights society).

Notice the shift from lost sales to lost licensing revenue, which is a better argument for the Authors Guild. In the American Geophysical case, the court concluded that there wasn’t a market for the research library to buy individual articles for researchers—no lost “sales”—but that there was a market for the library to buy the right to make photocopies—so there were lost “licensing revenues.” The tricky part, though, is similar: explaining what market the HathiTrust libraries should have gone to to purchase the necessary licenses. Interestingly, the Authors Guild doesn’t attempt to argue that HathiTrust should have purchased the licenses in one-to-one transactions with every copyright holder (a prospect that HathiTrust expert witness Joel Waldfogel opines would cost $569 million just to locate all the necessary rightsholders, not including the licensing fees themselves).

The first “ready market or means” the Authors Guild cites is collective management systems, in which a library would pay a collecting society for the right to digitize books, and the society would divide the money up among copyright holders. There’s just one eensy little problem with collective management, though: there is no collecting society for books in the United States. The University of Michigan can’t get a license to copy all the books in its collection, because there’s no one with authority to offer one. The usual test is, in American Geophysical’s words, whether such a market is “traditional, reasonable, or likely to be developed.” With no traditional or reasonable collective licensing market for books in existence, the Authors Guild thus must argue that one is likely to develop.

The only such prospect the brief offers—one rich with irony—is the rejected Google Books settlement. In the Authors Guild’s optimistic phrasing, the settlement “shows how a collective management system might work to permit certain of the activities of Defendants in this case while providing compensation to copyright owners.” I agree that yes, the settlement showed how such a system might work. But the settlement is no longer even a possibility, not after Judge Chin’s holding that it “exceeds what the Court may permit under Rule 23.” Indeed, in a forthcoming article, I will be explaining why the settlement would have been unconstitutional, to boot. It is hard to see how a defunct and impermissible settlement provides a basis for anything, let alone a functioning licensing market.

Permit me a brief digression on Daniel Gervais’s expert report on collective copyright management systems. He offers a clear and cogent history of collective management in the United States and in Europe. But he also makes a claim that strikes me as utterly unsupportable:

I believe that if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.

This is plausible only if one ignores the immense transaction costs associated with locating all of the copyright owners for the books in the HathiTrust corpus. The costs are simply being pushed from the libraries to the copyright collective; they don’t go away. It might be economically feasible for a collecting society to come into being for a subset of popular books, or for a large slice of recent books, but it is not plausible that one will come into existence for all books. The only way that could happen—given current United States law on ownership, transfer, and licensing of copyright—is if Congress passes legislation establishing one. If it does, that wouldn’t be “the market” providing collective licensing—and in any event, it would not be appropriate for Judge Baer to consider what Congress might or might not hypothetically do in determining whether there is a genuine licensing market for books that the HathiTrust libraries should have gone to.

Perhaps more plausibly, the Authors Guild argues that HathiTrust’s uses undermine existing licensing markets, even where the licenses offered in those markets aren’t quite the same as those HathiTrust sought. This is at least a cognizable theory of harm. The two licensing markets the Authors Guild cites are full-text search (e.g. Amazon’s Search Inside) and non-consumptive research. The evidence here of harm is, however, underdeveloped. The Authors Guild hasn’t shown that full-text search is a paying license, only that the licenses are granted to drive book sales—so it needs some further evidence linking HathiTrust full-text search to lost book sales (e.g. via diversion of searchers). And as for non-consumptive research, all the Authors Guild can offer are a few statements from individual plaintiff T.J. Stiles that, e..g:

From what I’ve learned about it, non-consumptive research represents a potentially exciting field for academics and therefore an emerging licensing opportunity for authors at a time when revenues are decreasing. Indeed, it is my understanding that the Amended Settlement Agreement entered into by The Authors Guild and Google would have permitted the defendant libraries to engage in non-consumptive research activities using works such as Jesse James—but pursuant to a license that included a mechanism to compensate authors.

Again, there is no real evidence of harm here: just because Stiles would like to license non-consumptive research does not mean there is anyone who will pay him for that license. Indeed, one of the amicus briefs specifically argues that non-consumptive research is also non-infringing, which would mean there is no licensing market to be had.

The final theory of market harm proffered by the Authors Guild is security risks from the creation and maintenance of the HathiTrust corpus in a 466-terabyte database connected to campus networks and to Web servers. (The brief is cleverly drafted to leave the impression that the contents of the database are available on those networks and on the Web, but avoids actually saying so.) This, the Authors Guild argues, creates the potential for security breaches and resulting piracy. (Unfortunately, many of the factual details on which specific allegations of poor security depend have been redacted.)

But all of this is only potential harm. Even Benjamin Edelman’s expert witness declaration doesn’t identify a single actual leak of a book from the Google Books project resulting in the kind of widespread Internet piracy he warns of. Like Gervais’s, Edelman’s report deals largely in hypotheticals. The most ungrounded of them is his speculation (echoed in the Authors Guild’s brief) that

Seventh, when books are scanned by a smaller and less sophisticated provider, there is a particularly acute risk of book contents being accessed and redistributed. For one, less sophisticated organizations have a reduced capability to design, install, and maintain suitable web site, database, and related security systems as well as anti-reconstruction systems to secure books. Furthermore, less sophisticated organizations have a lesser ability to screen key staff to prevent data loss through rogue employees, and a lesser ability to configure security systems to exclude hackers. Thus, if Defendants’ conduct is found to be legal, and if other companies and organizations follow Defendants’ lead in scanning books, the risk that book contents will be accessed and redistributed becomes even greater.

The behavior of differently situated entities is not germane to the factual and legal questions posed by HathiTrust’s behavior. If others might take bad precautions, that doesn’t tell us anything about how good or bad HathiTrust’s are. If Judge Baer is inclined to find that HathiTrust is making a fair use because it has sufficient security but is concerned that other book-scanners will cut corners, he can say so. His opinion could state, explicitly, that HathiTrust’s security passes muster because <insert details here>, thus making the security requirements part of his holding.

On HathiTrust’s side, it’s easiest just to quote from the brief:

Plaintiffs admit they are unable to identify “any specific, quantifiable past harm, or any documents relating to any such past harm” resulting from the Libraries’ uses of their works. …

Plaintiffs have not produced in discovery:

  1. any business plans for licensing the digitization of books; or
  2. any plans for the use books for preservation and research purposes; or
  3. any analysis of their costs for licensing such markets or anticipated revenues; or
  4. any communications with entities that might collectively license these rights; or
  5. any analysis of the substantial limitations that such an entity would face in terms of the number of works it could foreseeably license.


On my read of the briefs:

  • The first factor depends on which characterization of the facts Judge Baer finds more convincing. I personally think HathiTrust’s holistic perspective is more persuasive, especially given the image-search cases.
  • The second factor can be—ahem—factored out by splitting the corpus into informational and expressive subcorpi.
  • The third factor follows the first.
  • The fourth factor depends on evidence of harm to licensing revenue, and I just don’t see that evidence in the filings.

The Authors Guild has done well in the previous skirmishes of this litigation, from the huge unforced error of the Orphan Works Project to Ed Rosenthal’s masterful performance at oral argument in May. But the motions for summary judgment are where the fair use battle will really be fought, and here, I think HathiTrust has made a significantly stronger case in this first round. It will be very interesting to see the oppositions next week.

Next time: the blind and the amici.

As regards your conclusion on the Second Factor:

From the NobelPrize.org website:

The Nobel Prize in Literature 1953 was awarded to Winston Churchill “for his mastery of historical and biographical description as well as for brilliant oratory in defending exalted human values”.

Do you really think that works of fiction and poetry are ‘more creative’ while a non-fiction book is merely a conglomeration of facts?

I’d like to see you tell it to him.

Hey, I don’t make the rules; I just work here. However problematic the distinction between “informational” nonfiction and “creative” fiction is, it—like the terminology—is well-established in copyright law. E.g., from Sony v. Universal:

Thus, for example, informational works, such as news reports, that readily lend themselves to productive use by others, are less protected than creative works of entertainment.

It’s an imperfect approximation to divide the corpus into nonfiction and fiction—but less of one than refusing to make that distinction at all.

As for Churchill, I’m not sure whether to break the bad news to his ghost or to his ghostwriters.

News reports?

He could rule that copying nonfiction is fair but copying fiction and poetry is unfair, and sort out the consequences at the remedial stage.

Carl Sandburg was awarded two Pulitzer Prizes: One for his collected poetry and one for his Biography of Lincoln — so one would get the Hathi treatment and the other doesn’t.

Well anyway at least I now know you are reading my stuff.

Next Monday 16 July begins the WIPO SCCR 24 session in Geneva where on the agenda is the IFLA sponsored 23/5 PROPOSAL ON LIMITATIONS AND EXCEPTIONS FOR LIBRARIES AND ARCHIVES which includes the following:

Reproduction and Distribution of Copies by Libraries and Archives

  1. It shall be permitted for a library or archive to reproduce and to distribute a copy of a copyright work, or of material protected by related rights, to a library user, or to another library or archive, for purposes of:

a. education;

b. requests by users for research or private study;

c. interlibrary document supply;

provided that such reproduction and distribution is in accordance with existing international obligations, among them the Berne Convention.

The US WIPO Delegations response to the above in the SCCR document 23/8 Draft compilation at 34:

Obviously when a copy of an entire work is being made, there is the question of substantially adverse market effects to the publishers and authors. It is also important that this type of activity not be done in a systematic way, but that it would be a single occasions at the requests of libraries. …

There is a danger that one library could end up making copies for all libraries, essentially taking away an author’s market to the entire country once one copy is sold to one library.

The ALA and ARL who have filed amicus briefs in the current AG v. Hathi proceedings are both Members of the IFLA. The current ARL President is the former Chair of the IFLA Copyright / Legal Matters committee which authored the IFLA Treaty proposal now tabled as SCCR 23/5 above.


If circulation and use of the digital version is truly limited to the users of the library it is hard to see how it would have much impact outside the library. On the other hand if library use means open access to the whole web it seems obvious it would have impacts outside the library.