In addition to the Authors Guild and HathiTrust, three other groups filed briefs in relation to the summary judgment motions: a group of intervenors headlined by the National Federation of the Blind, a partnership of library associations and the Electronic Frontier Foundation, and a group of humanities and legal scholars. This post will review their briefs and show how their arguments fit in with the ones I’ve previously discussed.
National Federation of the Blind
This group of intervenors consists of the NFB and three individuals with print disabilities. Their claim, in a nutshell, is that the HathiTrust corpus is legal because it enables the university libraries to make their collections accessible to the print-disabled. The actual argument to get there comes in a few stages.
Stage one of the claim is the argument that the Americans with Disabilities Act creates an obligation for the libraries to offer accessible formats to the blind. Under Title III, for example, a place of public accommodation (such as a private university library) must:
take such steps as may be necessary to ensure that no individual with a disability is excluded, denied services, segregated or otherwise treated differently than other individuals because of the absence of auxiliary aids and services, unless the entity can demonstrate that taking such steps would fundamentally alter the nature of the good, service, facility, privilege, advantage, or accommodation being offered or would result in an undue burden.
Currently, university libraries have books converted into Braille or digital formats, or have humans read them aloud, as requested by print-disabled patrons. This conversion inherently slows down research, and it is limited by obvious resource constraints. Prior to Google, I think it would have been reasonably clear that preparing digital editions of every book in a library’s collection would have constituted an “undue burden.” But with Google offering to shoulder the costs to create OCR’ed digital editions, there is now a plausible argument that these universities have not just the ability but the obligation to supply their print-disabled patrons with accessible versions.
Of course, this argument depends in part on the legality of scanning the libraries’ collections, so there is an unavoidable circularity. (The same circularity afflicts the opposite argument: that the ADA imposes no such requirement because the scanning is illegal.) Hence the brief makes what I read as a slightly weaker argument: that the ADA creates a “national policy” and a “collective commitment” to equal access. This is the kind of public policy claim that feeds directly into a fair use analysis, suggesting that this particular use is specially favored.
The second stage of the argument transposes this policy into copyright law. Section 121 of the Copyright Act (as amended), known as the Chafee Amendment, sets forth an exemption from infringement for “an authorized entity” to make reproductions in “specialized formats exclusively for the use by blind or other persons with disabilities.” I do not take it that the HathiTrust corpus itself is in the necessary “specialized format”: instead, it presumably enables the creation of downstream copies in those formats that will actually be made available to the print-disabled. Thus, as applied to the creation and maintenance of the corpus itself, this is really a disguised fair use claim: the noninfringing downstream use renders the necessary upstream use fair.
In a series of comments to my previous post, john e miller raises serious questions about another piece of the Chafee Amendment argument: whether the HathiTrust libraries have “a primary mission to provide specialized services relating to training, education, or adaptive reading or information access needs of blind or other persons with disabilities” as they must if they are to qualify as “authorized entities.” The NFB brief offers a valiant effort to argue that they are, based on the policy of the ADA and the libraries’ goals.
I don’t think that there ought to be an authorized entity restriction in Section 121. What matters is the activity: creating accessible editions whose format keeps them from being usable by the sighted. Limiting the set of actors who can create accessible editions is pointless: if a general for-profit publisher can make them more effectively, it should be able to. But given the existence of the restriction, the NFB intervenors have a hard argument to make. The only textual hook really open to them is the one they seize on: that the Chafee Amendment refers to “a primary mission” using the indefinite article.
The NFB brief also makes a detailed fair use argument: access by the print-disabled to the HathiTrust corpus confers vast public benefit without any cost to copyright owners. It starts with a very effective framing:
In this regard, it bears remembering that for the blind public (i) access is denied without accessibility (i.e., converting printed works into a format from which they can be read by the blind); (ii) there is no accessibility to the vast contents of university libraries without comprehensive digitization; and (iii) there is no digitization of university libraries without the HathiTrust.
On the first factor, the brief argues that digitizing books for use by the blind is transformative because they are “copied for a different purpose.” To me, this stretches transformativeness beyond its reasonable limit. The brief’s point is that the blind are not currently served by most of the books in the corpus and that publishers are not selling these books to the blind in formats they can use. This is a fourth-factor point, combined with an argument that access by the blind for educational purposes is a generally favored use. It has nothing to do with transformativeness, which has become an all-purpose argument deployed whether it fits or not, much like claiming that a restriction on speech is “content-based” in First Amendment law.
The brief makes a stronger argument that the creation of the corpus is a protected form of intermediate copying. This argument—which reappears in the amicus briefs—fits with the defendants’ desired framing of the copying. It’s not about the database itself; it’s about the uses made of that database. And since providing accessible versions to the blind is such an obvious fair use, the preliminary copying needed to make those accessible versions possible is also fair.
The second and third factors add little new to the discussion in HathiTrust’s own brief, but the fourth factor is more interesting. The NFB argues that “there has never been, nor is there ever likely to be, a market for a digital database of library collections accessible to blind students and scholars.” A huge fraction of the books in those collections have never been made available by publishers in accessible editions at all. This is a great fourth-factor argument: showing that the market is simply not one that copyright owners have shown any substantial interest in.
Library Associations and the Electronic Frontier Foundation
This brief is submitted on behalf of three library associations—the American Library Association, the Association of Research Libraries, and the Association of College and Research Libraries—and the Electronic Frontier Foundation. It is more a collection of points than a single sustained argument. It seems likely that a similar brief will be placed before Judge Chin when the main Google Books case heads into summary judgment motions later in the summer.
The first point is that the Google Books corpus already offers huge public benefits. It cites statements from various librarians and researchers detailing how much Google Books helped their research, in many cases pointing them to books that they then purchased. This part of the brief is a bit odd, because the examples are all drawn from Google Books. Some involve information gleaned from snippet view; a few appear to be drawing on the multi-page previews explicitly authorized by copyright owners. Neither of these uses is available with HathiTrust’s search features—which helps the fair use case but also limits the public-benefit argument. The brief does about as good a job finessing this point as I think it is possible to do:
Moreover, if, as amici expect, Google’s practice of providing snippet views is ruled a permissible fair use, HDL could offer the same service, thereby dramatically increasing its public benefits. … HDL is not Google Books: it does not provide snippets as part of its search results. However, it could easily begin to do so and offer the same benefits. Moreover, some of the benefits described above can be achieved through HDL’s existing search functionality, without snippet display. HDL directs the user to the book and page number where the information sought may appear. The user still has to consult the physical book to obtain the information, but HDL greatly assists the user by pointing her to the appropriate books and pages.
The second point is to reinforce some of HathiTrust’s arguments about fair use on the first and fourth factors. On the first factor, it reviews some of the search-engine cases, emphasizing the public benefits of search. On the fourth factor, it gives this nicely succinct summary of market substitution.
HDL is hardly a market substitute for its current uses. Libraries do not pay for the “right” to preserve the works in their collections. Moreover, libraries do not pay for the “right” to provide copies to the print disabled. With respect to its search function, while HDL allows the public to search all of the books in its digital library, it provides only page numbers of in-copyright works.
And with respect to licensing revenue, the brief gives the slightly odd argument that a licensing regime is infeasible because library budgets are falling while libraries’ other costs are rising. This may be true as a factual matter, but it’s hard to see a court accepting the defendants’ poverty as a justification for what it would otherwise consider an infringing use. That’s just not something a court could admit; the holding will always be justified some other way.
As evidence of libraries’ practices, the brief quotes extensively from the Code of Best Practices in Fair Use for Academic and Research Libraries, which endorses many of HathiTrust’s practices, such as providing access to the print-disabled and digitizing to produce search engines. Putting these arguments squarely before a court offers a test of Peter Jaszi and Patricia Aufderheide’s argument in Reclaiming Fair Use that codes of best practices can help expand the contours of what courts will recognize as fair uses.
Next, the brief offers a short rejoinder to the Authors Guild’s argument that the court should find no fair use because Congress could create a collective licensing regime. The brief spells out some of the differences between the orphan works legislation considered by Congress in 2006 and 2008 and the HathiTrust/Google Books project, and offers a bit of gloom about the prospects for Congressional action. And then having stated that a statuary license is completely speculative, the brief speculates that if Congress did create one, the libraries couldn’t afford to pay for it. Why Congress would create an unaffordable license beats me, but it would hardly be the most pointless thing Congress has done.
Finally, the brief offers a laches argument, although, curiously, it never uses the term. It cites the seven-year interval between the commencement of the Google Books project and the filing of the suit against the HathiTrust libraries to emphasize that “the current situation - where the Court must consider the legality of an existing digital library of over 10 million books - is in large measure the result of litigation choices made by the Plaintiffs.” Since the plaintiffs took no steps to block the scanning for years as the libraries’ investment in the corpus mounted, the brief suggests that equity favors a finding of fair use. It also suggests that HathiTrust’s structure mirrors the arrangement in the now-rejected Google Books settlement; I’m not sure how far this argument goes, because the plaintiffs can always cast that as a concession they were willing to make in settlement in exchange for other terms, not something they are okay with standing alone.
Digital Humanities and Law Scholars
The most interesting brief is the one organized by copyright scholars Jason Schultz and Matthew Sag on behalf of a list of scholars in the “digital humanities” who use data mining, statistical analysis, and other large-scale computations on large corpuses of literary texts to analyze linguistic and thematic patterns. (For more on the field, see this paper from Science and Matthew Jockers’ presentation from the spring Berkeley Orphan Works conference.) They’re joined by legal scholars who think these kinds of research are socially valuable, and so the mass digitization on which they depend ought to be treated as fair use. The central idea, developed primarily by Sag in a series of thoughtful articles, is that this kind of research makes only “nonexpressive uses” of the books.
I must confess that I am highly skeptical of digital humanities as most of its proponents practice it. But the underlying legal argument strikes me as compelling: these computational analyses provide benefits to users without offsetting costs to copyright owners. The brief pulls together a series of doctrinal arguments that the nonexpressive aspects of books are simply not protected by copyright. The books themselves are copyrighted, but searches, counts, clustering algorithms, and the whole array of digital humanities techniques do not make use of the books in ways that implicate a copyright owner’s interests.
Text mining extracts ideas from books, abstracting away from any individual author’s particular expression. Moby Dick is a book about whales, not a book about dinosaurs. The most difficult point for this are the Harry Potter Lexicon, the Seinfeld Aptitude Test, and similar cases, where courts have protected creative works against references and quiz books, which unsuccessfully claimed to be taking only the “facts” from the underlying creative works. As the brief explains:
The supposed “facts” conveyed in the “Seinfeld” quiz book were not truly facts about the television program; they were “in reality fictitious expression created by Seinfeld’s authors.”
By contrast, the many forms of metadata produced by the library digitization at the heart of this litigation do not merely recast copyrightable expression from underlying works; rather, the metadata encompasses numerous uncopyrightable facts about the works, such as author, title, frequency of particular words or phrases, and the like.
The brief then makes a variant on the intermediate-copying argument. This time, it isn’t rooted in technological intermediate copies (e.g. a database of student term papers used to check for plagiarism), but rather in creative intermediary copies (e.g., a film studio’s preliminary scripts of an allegedly infringing film). The point here is to emphasize that HathiTrust doesn’t show patrons any excerpts from any of the books; it is not taking advantage of their expressive features.
There follows a fair use argument. Much, by now, is standard. Digital humanities research easily qualifies as a transformative use for a favored purpose under the first factor; the brief does a nice job bringing in some news reporting cases. Under the second factor, it pulls a quotation from a technological intermediate copying case that humans “cannot gain access to the unprotected ideas and functional concepts contained in [the copyrighted work] without … making copies.” It further reprises the technological intermediate copying argument under the third factor. And under the fourth, it emphasizes that transformative uses are categorically outside the markets reserved to the copyright owner.
Although I am skeptical of the straight Section 121 argument offered by the NFB intervenors, the three additional briefs offer a series of strong fair use arguments. Together with the HathiTrust brief, they provide a series of overlapping fair use justifications that I find quite convincing. I look forward to the next round of briefing, to see how the plaintiffs respond.