The Laboratorium
July 2012

In re Books


I am pleased to announce In re Books, a conference on law and the future of books to be held at New York Law School on October 26 and 27. It will feature a wide-ranging conversation among authors, publishers, librarians, readers, and scholars about how law inflects all aspects of the creation, distribution, and consumption of books—and how these laws should change as the digital transition upends publishing. There will be panels devoted to electronic rights, the publishing industry, the future of libraries, readers’ rights, orphan books and mass digitization, and the long view on the history and future of books. There are more hot topics than I can list, from U.S. v. Apple to the first sale cases at the Supreme Court, from fair use to collective licensing, from reader privacy to the perennially fraught author-publisher relationship. Like our previous conference, D is for Digitize, In re Books will be an opportunity for wide-ranging, thoughtful, and good-faith conversation among all of us who share an interest in the written word.

The conference website at nyls.edu/inrebooks is live, and we’ll be adding information about the program and registration in the weeks to come. I look forward to seeing you all there!

A Patent Tangent on Google Books


Courtesy of Eric Goldman, here’s a cute little patent case with a Google Books angle, Celorio v. Google. Celorio holds a U.S. patent on an “Electronic bookstore vending machine.” In his view, his patent describes a print-on-demand Espresso Book Machine to a T. So he sued Espresso’s maker, On Demand Books, and also Google. As the court explains, Celorio alleges that Google

  • Directly infringes “out of his premises in California” by installing one or more Espresso Book Machines in its offices, where a user can request a book be downloaded and printed on demand.
  • Commits contributory infringement because it offers to sell and sells electronic books, which are component of a patented device and apparatus.
  • Commits contributory infringement by transferring its files to Espresso Book Machines, knowing the files will be transformed into a physical book using the ‘890 Patent without permission.
  • Induces infringement “by promoting and encouraging the purchase of electronic books to [be] printed on demand” by Espresso Book Machines.
  • Induces infringement by virtue of Google Books’ ability to provide searches of indexed book content.

The case doesn’t get at all into the facts of what Google does or whether Celorio’s patent really does cover any of these activities. Instead, it’s an early-stage dispute over how specifically Celorio needs to explain his theories of infringement, and the court’s conclusion is that he’s explained them well enough for the case to proceed. The takeaway is that not even the public domain books in Google Books are completely clear of legal challenges.

Casebook 2.0


I’m very happy to announce that the second edition of my casebook, Internet Law: Cases and Problems is now available from Semaphore Press. Like the first edition, it’s a DRM-free pay-what-you-want PDF download with a suggested price of $30. But your dollar goes even further now: the second edition has almost a hundred pages of new material. Here is the complete table of contents and here are some of the highlights of the new version:

  • Twenty new cases, including recent landmarks like United States v. Doe on Fifth Amendment protection for passwords and Viacom v. YouTube on eligibility for the Section 512 safe harbor.
  • Six new problems, with challenges such as designing a lolcat generator’s DMCA policy and finessing the extradition of a copyright infringement defendant.
  • A revamped Speech chapter, with sections on what qualifies as “speech” online (covering Facebook likes, computer source code, and bloggers as journalists) and on hamrful speech (covering linking liability, harassment, and cyberbulying).
  • A new section on virtual property that considers domain names, theft of information, and items in online games.
  • A new section on “New Frontiers in Copyright Enforcement” that discusses YouTube’s Content ID system as a private alternative to the DMCA, the Copyright Alert graduated response system, and domain-name seizures.
  • Expanded use of legal materials beyond just appellate case reports: more statutory text, pleadings, contracts, and FAQs.

As an experiment, we’re also offering a bound version through Lulu.com. That said, I and my publisher encourage you to save trees and buy the electronic version instead. As always, I welcome your comments and suggestions on the book.

All in the Timing


On Wednesday, Judge Nathan issued a decision in ABC v. Aereo, a.k.a. the “tiny antennas” case. Aereo rebroadcasts over-the-air TV signals on the Internet. Other services, from iCraveTV to ivi have tried this same business model, and uniformly gone down to defeat in the courts. But Aereo broke the streak; Judge Nathan denied the TV networks suing it a preliminary injunction, holding that they hadn’t shown a likelihood of success in proving that Aereo was infringing on their copyrights.

(A bell rings softly.)

The only significant difference between Aereo and its ill-fated predecessors is that Aereo doesn’t use a single antenna to capture the TV signals. Instead it uses an array of antennas—eighty on each circuit board—to assign each subscriber her own antenna for as long as she wants to watch TV live or to record TV shows for later. This is a ridiculous distinction, and I have previously written that, “In any sane world, Aereo would not exist.” But under the Second Circuit precedent of Cartoon Network v. CSC Holdings (the “Cablevision” case), this result was close to compelled.

Cablevision was a case about a “remote DVR,” essentially a TiVo in the cloud at the cable company’s facilities rather than in the customer’s home. Cablevision took incoming TV signals and passed them through a buffer holding 1.2 seconds of signal at a time. If a customer wanted to record a program, the video signal was split and a copy saved on a portion of a hard drive reserved for the customer. The customer could then, at any later time, hit “play” and the video would be streamed from her unique copy on the remote hard drive to her home TV set. In a careful opinion that requires closed and repeated reading, the Second Circuit held that nothing Cablevision was doing constituted copyright infringement. (The case did not confront the question of whether its customers were infringers, or whether Cablevision might be liable if they were.)

The portion of the opinion most obviously relevant to Aereo dealt with the playback streaming from the remotely stored copy to the subscriber’s TV. This was not a “public” performance, held the court, because the “universe of people capable of receiving” the transmission consisted of exactly one: the subscriber. One person does not “the public” make. A series of prior cases had held that repeated individual performances for different people could constitute “public” performances. (I previously summarized these cases when I correctly predicted that Zedvia’s DVD-based Internet streaming business model would fail in the courts.) In order to distinguish them, Cablevision held that it was the use of a single unique copy for each subscriber that made the difference.

Aereo picked up this idea and ran with it: hence the individual tiny antennas. When a user wants to record a program for later viewing, she is assigned an antenna and a unique copy of the program is stored on Aereo’s hard drives for her. Later, when she watches the recorded program, it’s streamed from this unique copy. Indeed, even Aereo’s “watch live” feature is bounced through a hard drive copy before it starts streaming to the user. The only difference is that the “live” hard drive copy is deleted as soon as the viewer stops watching. This is an obvious Cablevision play, and Judge Nathan held that the cases were effectively indistinguishable.

(Bell.)

The networks’ most-sustained attempt to distinguish Cablevision focused on the timing of playback. They argued that because the Cablevision copies were time-shifted—i.e., viewed at a later time—this “br[oke] the chain of transmission” so that the transmissions from the cable channels to Cablevision and from Cablevision DVRs to subscribers should be counted separately for purposes of assessing whether the allegedly infringing transmission is “to the public.” In contrast, where the retransmission happens contemporaneously or nearly so—as with the Aereo “watch live” feature—the chain is unbroken, and the retransmission should be counted together with the original transmission. This has the transitive effect of counting all the retransmissions together with each other, and hence making them all one big transmission “to the public,” hence infringing.

Judge Nathan rejected this argument. Nothing in Cablevision’s public-performance holding explicitly referred to time-shifting. Moreover, the networks were unable to provide her with a convincing line as to when a transmission has been time-shifted by enough that it breaks the chain. They proposed a test of “complete” time-shifting: i.e., watching a program after its original broadcast has ended. But Judge Nathan made medium shrift of this proposed test:

For example, as Plaintiffs would have it, an Aereo user who begins watching a recording of the Academy Awards, initially broadcast at 6:00 pm, one minute before the program ends at 11 :00 pm has not allowed the chain of transmission to be broken, despite the nearly five hours of time-shifting that has occurred. In contrast, a user who begins watching a standard half-hour sitcom just a minute after its initial broadcast ends would “break the chain of transmission” for that program after just 31 minutes of time shifting. These examples suggest the extent to which Plaintiffs’ position regarding “complete” time-shifting is unmoored from its foundation in “breaking the chain of transmission.”

(Bell.)

I would like to suggest that if one is inclined to look at time-shifting as the chain-breaker—and there needs to be some chain-breaker, or the very concept of a “public” performance is meaningless as applied to Internet streaming—then there is an elegant test hiding in plain sight in another part of the Cablevision opinion. It is not a test that would have helped the networks in Aereo, which is likely one reason they didn’t raise it. But it does direct our attention to the crucial issues in Internet streaming cases.

Recall that Cablevision’s remote DVR passed all of the incoming signals through a 1.2-second revolving buffer. In another part of the Cablevision opinion, the Second Circuit held that this buffer copy was not an infringement of the reproduction right, because it was not stored “for a period of more than transitory duration.” Focusing on how long a copy endures enabled the court to distinguish other cases that had held that in-memory copies of computer programs could infringe, even though the copies vanish when the computer is turned off. (Not actually true, but who’s keeping track of little factual quibbles like that?) Those copies, the Cablevision court explained, endured for minutes or more, unlike the “copies” in Cablevision’s buffer, which vanished in less than two seconds.

Thus, if one is looking for a temporal test, Cablevision already supplies one. The buffer-copy portion of the opinion explains that a copy is permanent enough to count as a potential infringement of the reproduction right if endures “for a period of more than transitory duration” (somewhere between a few seconds and a few minutes, based on the cases we have). Why not use this same period to measure whether a tape-delay is long enough to count in distinguishing two transmissions for purposes of the public performance right?

There is an obvious conceptual economy in reusing the same test in two places. But the elegance of this solution, I think, goes deeper. It keeps buffer-based business models from slipping through the cracks between the different exclusive rights. A rebroadcaster will always be engaged in either a reproduction or a public performance. To see why, consider the two cases. If the rebroadcaster uses transient buffers that are overwritten rapidly, then the chain of transmission is unbroken, which means each of its streams is considered part of the original broadcast and the public performance right is implicated. If the rebroadcaster uses more permanent buffers, then the performances may not be public, but the buffers themselves count as reproductions. There is no way to design the system that does not implicate one right or the other.

(Bell.)

That is as it should be. The real legal issues at stake in Aereo, like those in Zediva, ReDigi (see also), the DISH Hopper, and Cablevision itself are the same ones that have been at stake since Sony and before: Which personal uses of media are fair uses, and when should companies face liability for helping individuals make those uses? If Aereo wins, it ought to be because consumers have a fair use right to make and view personal copies of TV programs to which they already have access, and that Aereo is entitled to help them make and view those copies—not because its system is engineered never to implicate the Copyright Act. If DISH loses, it ought to be because consumers do not have a fair use right to make a wholesale copy of an entire week’s worth of primetime programming and DISH is not entitled to help them make that copy—not because DISH rather than its users “makes” those copies.

Copyright owners and technology companies have spent decades quietly avoiding these personal-use fair use questions. This has not been healthy for the development of copyright law or for media technology. Fair use focuses on market realities and user experiences; it is holistic and sensitive to context. By avoiding these questions, copyright law has instead focused on articulating, in ever more excruciating detail, the boundaries of the exclusive rights. These doctrines are reductionist: they focus on the hidden technical details of a system.

Cablevision is itself an example of the phenomenon in action. Cablevision agreed not to raise a fair use defense; the copyright owner plaintiffs agreed not to raise any secondary liability arguments. The case therefore dealt only with Cablevision’s own direct liability for operating the remote DVRs. Its holdings—particularly the slippery, much-maligned, and much-misunderstood “volitional conduct” holding—have to be understood in this context. They reflect a court trying to reach a sensible overall result without employing the doctrine—fair use—best suited to its intuitions about the case. And they reflect a court making narrow and technically precise rulings deep in the weeds of copyright doctrine—but which have turned out to have implications far beyond the facts of the case.

(Bell.)

Although I have misgivings about Cablevision’s turn and although I see Aereo in terms of the road not taken, I still find myself admiring Judge Nathan’s opinion. It’s a good illustration of the difference between a trial court and its appellate hierarchy. The Second Circuit or the Supreme Court might rethink the shape of public performance doctrine, distinguishing Cablevision or putting it on a different footing. But a District Court applying Cablevision is bound to apply it faithfully, as Judge Nathan does. Her application of it to Aereo is true to its letter and spirit. The unique-copy test is the law as handed down by the Second Circuit, and under that test, Aereo wins.

At least for now, that is. Left open in the opinion is the question of whether the entire tiny antenna gimmick is an elaborate sham. The networks provided an expert to opine that each circuit board with eighty alleged tiny antennas attached to it in fact constitutes a single big antenna. Aereo put forward its own experts to opine au contraire, and Judge Nathan found their evidence more convincing. But much of this was due to tactical issues: the networks didn’t have their expert testify, and so they lost a significant credibility finding. It’s best to regard this piece of their argument as “not proved” rather than “definitively rejected.” You can bet it’ll be back as the case proceeds.

(Bell.)

HathiTrust Summary Judgment Motions: NFB and Amici


In addition to the Authors Guild and HathiTrust, three other groups filed briefs in relation to the summary judgment motions: a group of intervenors headlined by the National Federation of the Blind, a partnership of library associations and the Electronic Frontier Foundation, and a group of humanities and legal scholars. This post will review their briefs and show how their arguments fit in with the ones I’ve previously discussed.

National Federation of the Blind

This group of intervenors consists of the NFB and three individuals with print disabilities. Their claim, in a nutshell, is that the HathiTrust corpus is legal because it enables the university libraries to make their collections accessible to the print-disabled. The actual argument to get there comes in a few stages.

Stage one of the claim is the argument that the Americans with Disabilities Act creates an obligation for the libraries to offer accessible formats to the blind. Under Title III, for example, a place of public accommodation (such as a private university library) must:

take such steps as may be necessary to ensure that no individual with a disability is excluded, denied services, segregated or otherwise treated differently than other individuals because of the absence of auxiliary aids and services, unless the entity can demonstrate that taking such steps would fundamentally alter the nature of the good, service, facility, privilege, advantage, or accommodation being offered or would result in an undue burden.

Currently, university libraries have books converted into Braille or digital formats, or have humans read them aloud, as requested by print-disabled patrons. This conversion inherently slows down research, and it is limited by obvious resource constraints. Prior to Google, I think it would have been reasonably clear that preparing digital editions of every book in a library’s collection would have constituted an “undue burden.” But with Google offering to shoulder the costs to create OCR’ed digital editions, there is now a plausible argument that these universities have not just the ability but the obligation to supply their print-disabled patrons with accessible versions.

Of course, this argument depends in part on the legality of scanning the libraries’ collections, so there is an unavoidable circularity. (The same circularity afflicts the opposite argument: that the ADA imposes no such requirement because the scanning is illegal.) Hence the brief makes what I read as a slightly weaker argument: that the ADA creates a “national policy” and a “collective commitment” to equal access. This is the kind of public policy claim that feeds directly into a fair use analysis, suggesting that this particular use is specially favored.

The second stage of the argument transposes this policy into copyright law. Section 121 of the Copyright Act (as amended), known as the Chafee Amendment, sets forth an exemption from infringement for “an authorized entity” to make reproductions in “specialized formats exclusively for the use by blind or other persons with disabilities.” I do not take it that the HathiTrust corpus itself is in the necessary “specialized format”: instead, it presumably enables the creation of downstream copies in those formats that will actually be made available to the print-disabled. Thus, as applied to the creation and maintenance of the corpus itself, this is really a disguised fair use claim: the noninfringing downstream use renders the necessary upstream use fair.

In a series of comments to my previous post, john e miller raises serious questions about another piece of the Chafee Amendment argument: whether the HathiTrust libraries have “a primary mission to provide specialized services relating to training, education, or adaptive reading or information access needs of blind or other persons with disabilities” as they must if they are to qualify as “authorized entities.” The NFB brief offers a valiant effort to argue that they are, based on the policy of the ADA and the libraries’ goals.

I don’t think that there ought to be an authorized entity restriction in Section 121. What matters is the activity: creating accessible editions whose format keeps them from being usable by the sighted. Limiting the set of actors who can create accessible editions is pointless: if a general for-profit publisher can make them more effectively, it should be able to. But given the existence of the restriction, the NFB intervenors have a hard argument to make. The only textual hook really open to them is the one they seize on: that the Chafee Amendment refers to “a primary mission” using the indefinite article.

The NFB brief also makes a detailed fair use argument: access by the print-disabled to the HathiTrust corpus confers vast public benefit without any cost to copyright owners. It starts with a very effective framing:

In this regard, it bears remembering that for the blind public (i) access is denied without accessibility (i.e., converting printed works into a format from which they can be read by the blind); (ii) there is no accessibility to the vast contents of university libraries without comprehensive digitization; and (iii) there is no digitization of university libraries without the HathiTrust.

On the first factor, the brief argues that digitizing books for use by the blind is transformative because they are “copied for a different purpose.” To me, this stretches transformativeness beyond its reasonable limit. The brief’s point is that the blind are not currently served by most of the books in the corpus and that publishers are not selling these books to the blind in formats they can use. This is a fourth-factor point, combined with an argument that access by the blind for educational purposes is a generally favored use. It has nothing to do with transformativeness, which has become an all-purpose argument deployed whether it fits or not, much like claiming that a restriction on speech is “content-based” in First Amendment law.

The brief makes a stronger argument that the creation of the corpus is a protected form of intermediate copying. This argument—which reappears in the amicus briefs—fits with the defendants’ desired framing of the copying. It’s not about the database itself; it’s about the uses made of that database. And since providing accessible versions to the blind is such an obvious fair use, the preliminary copying needed to make those accessible versions possible is also fair.

The second and third factors add little new to the discussion in HathiTrust’s own brief, but the fourth factor is more interesting. The NFB argues that “there has never been, nor is there ever likely to be, a market for a digital database of library collections accessible to blind students and scholars.” A huge fraction of the books in those collections have never been made available by publishers in accessible editions at all. This is a great fourth-factor argument: showing that the market is simply not one that copyright owners have shown any substantial interest in.

Library Associations and the Electronic Frontier Foundation

This brief is submitted on behalf of three library associations—the American Library Association, the Association of Research Libraries, and the Association of College and Research Libraries—and the Electronic Frontier Foundation. It is more a collection of points than a single sustained argument. It seems likely that a similar brief will be placed before Judge Chin when the main Google Books case heads into summary judgment motions later in the summer.

The first point is that the Google Books corpus already offers huge public benefits. It cites statements from various librarians and researchers detailing how much Google Books helped their research, in many cases pointing them to books that they then purchased. This part of the brief is a bit odd, because the examples are all drawn from Google Books. Some involve information gleaned from snippet view; a few appear to be drawing on the multi-page previews explicitly authorized by copyright owners. Neither of these uses is available with HathiTrust’s search features—which helps the fair use case but also limits the public-benefit argument. The brief does about as good a job finessing this point as I think it is possible to do:

Moreover, if, as amici expect, Google’s practice of providing snippet views is ruled a permissible fair use, HDL could offer the same service, thereby dramatically increasing its public benefits. … HDL is not Google Books: it does not provide snippets as part of its search results. However, it could easily begin to do so and offer the same benefits. Moreover, some of the benefits described above can be achieved through HDL’s existing search functionality, without snippet display. HDL directs the user to the book and page number where the information sought may appear. The user still has to consult the physical book to obtain the information, but HDL greatly assists the user by pointing her to the appropriate books and pages.

The second point is to reinforce some of HathiTrust’s arguments about fair use on the first and fourth factors. On the first factor, it reviews some of the search-engine cases, emphasizing the public benefits of search. On the fourth factor, it gives this nicely succinct summary of market substitution.

HDL is hardly a market substitute for its current uses. Libraries do not pay for the “right” to preserve the works in their collections. Moreover, libraries do not pay for the “right” to provide copies to the print disabled. With respect to its search function, while HDL allows the public to search all of the books in its digital library, it provides only page numbers of in-copyright works.

And with respect to licensing revenue, the brief gives the slightly odd argument that a licensing regime is infeasible because library budgets are falling while libraries’ other costs are rising. This may be true as a factual matter, but it’s hard to see a court accepting the defendants’ poverty as a justification for what it would otherwise consider an infringing use. That’s just not something a court could admit; the holding will always be justified some other way.

As evidence of libraries’ practices, the brief quotes extensively from the Code of Best Practices in Fair Use for Academic and Research Libraries, which endorses many of HathiTrust’s practices, such as providing access to the print-disabled and digitizing to produce search engines. Putting these arguments squarely before a court offers a test of Peter Jaszi and Patricia Aufderheide’s argument in Reclaiming Fair Use that codes of best practices can help expand the contours of what courts will recognize as fair uses.

Next, the brief offers a short rejoinder to the Authors Guild’s argument that the court should find no fair use because Congress could create a collective licensing regime. The brief spells out some of the differences between the orphan works legislation considered by Congress in 2006 and 2008 and the HathiTrust/Google Books project, and offers a bit of gloom about the prospects for Congressional action. And then having stated that a statuary license is completely speculative, the brief speculates that if Congress did create one, the libraries couldn’t afford to pay for it. Why Congress would create an unaffordable license beats me, but it would hardly be the most pointless thing Congress has done.

Finally, the brief offers a laches argument, although, curiously, it never uses the term. It cites the seven-year interval between the commencement of the Google Books project and the filing of the suit against the HathiTrust libraries to emphasize that “the current situation - where the Court must consider the legality of an existing digital library of over 10 million books - is in large measure the result of litigation choices made by the Plaintiffs.” Since the plaintiffs took no steps to block the scanning for years as the libraries’ investment in the corpus mounted, the brief suggests that equity favors a finding of fair use. It also suggests that HathiTrust’s structure mirrors the arrangement in the now-rejected Google Books settlement; I’m not sure how far this argument goes, because the plaintiffs can always cast that as a concession they were willing to make in settlement in exchange for other terms, not something they are okay with standing alone.

Digital Humanities and Law Scholars

The most interesting brief is the one organized by copyright scholars Jason Schultz and Matthew Sag on behalf of a list of scholars in the “digital humanities” who use data mining, statistical analysis, and other large-scale computations on large corpuses of literary texts to analyze linguistic and thematic patterns. (For more on the field, see this paper from Science and Matthew Jockers’ presentation from the spring Berkeley Orphan Works conference.) They’re joined by legal scholars who think these kinds of research are socially valuable, and so the mass digitization on which they depend ought to be treated as fair use. The central idea, developed primarily by Sag in a series of thoughtful articles, is that this kind of research makes only “nonexpressive uses” of the books.

I must confess that I am highly skeptical of digital humanities as most of its proponents practice it. But the underlying legal argument strikes me as compelling: these computational analyses provide benefits to users without offsetting costs to copyright owners. The brief pulls together a series of doctrinal arguments that the nonexpressive aspects of books are simply not protected by copyright. The books themselves are copyrighted, but searches, counts, clustering algorithms, and the whole array of digital humanities techniques do not make use of the books in ways that implicate a copyright owner’s interests.

Text mining extracts ideas from books, abstracting away from any individual author’s particular expression. Moby Dick is a book about whales, not a book about dinosaurs. The most difficult point for this are the Harry Potter Lexicon, the Seinfeld Aptitude Test, and similar cases, where courts have protected creative works against references and quiz books, which unsuccessfully claimed to be taking only the “facts” from the underlying creative works. As the brief explains:

The supposed “facts” conveyed in the “Seinfeld” quiz book were not truly facts about the television program; they were “in reality fictitious expression created by Seinfeld’s authors.”

By contrast, the many forms of metadata produced by the library digitization at the heart of this litigation do not merely recast copyrightable expression from underlying works; rather, the metadata encompasses numerous uncopyrightable facts about the works, such as author, title, frequency of particular words or phrases, and the like.

The brief then makes a variant on the intermediate-copying argument. This time, it isn’t rooted in technological intermediate copies (e.g. a database of student term papers used to check for plagiarism), but rather in creative intermediary copies (e.g., a film studio’s preliminary scripts of an allegedly infringing film). The point here is to emphasize that HathiTrust doesn’t show patrons any excerpts from any of the books; it is not taking advantage of their expressive features.

There follows a fair use argument. Much, by now, is standard. Digital humanities research easily qualifies as a transformative use for a favored purpose under the first factor; the brief does a nice job bringing in some news reporting cases. Under the second factor, it pulls a quotation from a technological intermediate copying case that humans “cannot gain access to the unprotected ideas and functional concepts contained in [the copyrighted work] without … making copies.” It further reprises the technological intermediate copying argument under the third factor. And under the fourth, it emphasizes that transformative uses are categorically outside the markets reserved to the copyright owner.

Conclusion

Although I am skeptical of the straight Section 121 argument offered by the NFB intervenors, the three additional briefs offer a series of strong fair use arguments. Together with the HathiTrust brief, they provide a series of overlapping fair use justifications that I find quite convincing. I look forward to the next round of briefing, to see how the plaintiffs respond.

Point, Counterpoint, Countercounterpoint


In a headline and a paragraph, the Onion demonstrates:

  1. As a rule, rape jokes aren’t funny.
  2. There are exceptions to that rule.
  3. Daniel Tosh was not working within one of those exceptions.

See also.

HathiTrust Summary Judgment Motions: Fair Use


The Public Index is mostly better now, thank you for asking. I hope to have the filings from the summary judgment motions posted relatively soon, but they’re not up yet.

The fair use issue is fairly joined. The Authors Guild and HathiTrust have presented sharply divergent accounts of the HathiTrust corpus. (The briefs are collected here.) In this post, I’ll focus on their briefs, which are the heart of the case. In a follow-up next week, I’ll deal with the National Federation of the Blind’s arguments for fair use, along with the amicus briefs filed yesterday by library associations and by a group of “digital humanities and law scholars.” Here are the highlights, factor by factor:

Factor Zero: Framing

The Authors Guild focuses on the “systematic” copying needed to make the HathiTrust corpus: the initial scans by Google and the replication of those scans at HathiTrust facilities. It cites cases like American Geophysical, MP3.com, and Encyclopedia Britannica v. Crooks in which wholesale copying by an institution was held not to be fair use, even if the individuals supplied by that institution might have had stronger fair use cases for non-systemic copying.

In contrast, HathiTrust focuses on the end uses to which the corpus is put: full-text search, preservation, and print-disabled access. These end uses themselves are unproblematically legal (according to HathiTrust), and so the copying to make the HathiTrust corpus enables only legal end uses. It cites cases like Sega v. Accolade, Sony v. Connectix, and A.V. v. iParadigms in which intermediate copying was held legal when undertaken solely to enable non-infringing end uses. The Authors Guild focuses on the corpus itself; HathiTrust focuses on its uses.

Factor One: Purpose of the Use

HathiTrust describes the corpus in terms of classically favored purposes: teaching, scholarship, and research. But the Authors Guild responds that while library patrons may engage in those purposes, the libraries themselves don’t. This response has to be right at some level: campus bookstores can’t just start photocopying textbooks on the grounds that students will make educational uses with them. And this is a significant part of the American Geophyiscal analogy, where a corporate library’s copying of articles for research scientists’ convenience was held to be unfair. But the point shouldn’t be pushed too far, as it depends on one of those crossovers between the first and fourth factors: the bookstore’s copying is unfair, in large part, because it substitutes for purchases of textbooks: the photocopy supersedes the purpose of the original textbooks. That’s much less clear here, where authors don’t make a digital corpus available and the HathiTrust corpus isn’t used to eliminate book purchases for patron access. But more on that further down.

Also part of the first-factor calculus is whether the use is commercial or noncommercial. The libraries are all non-profit entities and the HathiTrust corpus will be used for nonprofit educational uses. The Authors Guild tries to tar the project with the Google brush, emphasizing both Google’s commercial purposes and the value of the copies HathiTrust received in return for letting Google engage in the scanning. The former strikes me as irrelevant here (of course it is highly relevant in the Authors Guild’s lawsuit against Google). The latter is one of the persistent trouble spots in copyright caselaw: whether merely avoiding the expense of paying for something (leave aside for now whether it’s something available for purchase in the first pace) makes infringement “commercial.” Suffice it to say that there are cases going in both directions—driven more, I think, by other contextual factors than by the pure economics of the transaction.

Finally, there is the crucial first-factor question: is the use transformative? Under a traditional conception—as of, say, two decades ago—it isn’t. A HathiTrust copy is an exact reproduction of a book; its purpose is complete fidelity. The Authors Guild calls it “mechanical.” But HathiTrust draws on a more recent line of cases that have found transformativeness in a new place. Instead of transforming the work itself by imbuing it with new creativity; this new species of transformativeness changes the way the work is used, putting it to a use that recontextualizes it. Internet search engines have won two key cases holding that displaying thumbnails for purposes of image search is transformative. Those cases didn’t deal with the copying to make the index itself, but that’s where the intermediate-copying cases HathiTrust cites would come in. I expect that the follow-up rounds of briefing will deal with the question of how close the analogy to Internet image search is.

HathiTrust also points to print-disabled access and preservation as “transformative”: more on that next week.

Factor Two: Nature of the Work

The vast majority of the books in the HathiTrust corpus are published. If they were unpublished, that would tend to cut against fair use. But since they’re published, this part of the second factor doesn’t have much to say. HathiTrust emphasizes that most of the books in the corpus are out-of-print, and uses this point to explain to the court some of the difficulties and uncertainty that affect the copyright status of any significant corpus of old books. This is not a traditional second-factor argument; it will be interesting to see how the Authors Guild responds.

The sharper disagreement here concerns factual versus more creative works. The scope of fair use is broader for the first. There is no serious question that huge swaths of the books in the university collections that were scanned to make the HathiTrust corpus are factual monographs. There is also no serious question that mixed in with these are some more creative books—fiction and poetry from every era. According to HathiTrust, the ratio is about ten to one. The Authors Guild responds that given the indiscriminate shelf-clearing nature of the scanning, HathiTrust shouldn’t be able to claim the benefit of copying more informational works. That strikes me as an easily solvable problem if Judge Baer thinks the fair use case would turn on the second factor: he could rule that copying nonfiction is fair but copying fiction and poetry is unfair, and sort out the consequences at the remedial stage.

Factor Three: Amount Copied

In the most literal sense, HathiTrust has copied the whole of every book it has scanned. Repeatedly. But this factor can be squirrely. Where the use is transformative under the first factor, the copying under the third factor is judged not only absolutely, but also in relation to how much the defendant needed to copy for its transformative use. To make an index, you need to copy complete books en masse. So if you buy HathiTrust’s story on the first factor, you’ll buy its story on the third factor, too. If not, not.

Factor Four: Effect on the Market

The Authors Guild starts with its weakest argument: “Each digital copy … represents a lost sale to the book’s rightsholders.” This is true only under a strained definition of “sale,” because many of the books in the corpus are out of print and some are unavailable at any price. Indeed, few of the books are for “sale” for the full range of uses made by HathiTrust. A few sentences later, the Authors Guild acknowledges this, writing

To the extent a particular book was no longer in print or was unavailable for sale in digital format when Defendants sought to create a digital copy (though of course Defendants admit they never checked), they could have negotiated a license to do so, either separately with each author/publisher (as Google has done in its Google Books Partner Program) or collectively through a collective rights society).

Notice the shift from lost sales to lost licensing revenue, which is a better argument for the Authors Guild. In the American Geophysical case, the court concluded that there wasn’t a market for the research library to buy individual articles for researchers—no lost “sales”—but that there was a market for the library to buy the right to make photocopies—so there were lost “licensing revenues.” The tricky part, though, is similar: explaining what market the HathiTrust libraries should have gone to to purchase the necessary licenses. Interestingly, the Authors Guild doesn’t attempt to argue that HathiTrust should have purchased the licenses in one-to-one transactions with every copyright holder (a prospect that HathiTrust expert witness Joel Waldfogel opines would cost $569 million just to locate all the necessary rightsholders, not including the licensing fees themselves).

The first “ready market or means” the Authors Guild cites is collective management systems, in which a library would pay a collecting society for the right to digitize books, and the society would divide the money up among copyright holders. There’s just one eensy little problem with collective management, though: there is no collecting society for books in the United States. The University of Michigan can’t get a license to copy all the books in its collection, because there’s no one with authority to offer one. The usual test is, in American Geophysical’s words, whether such a market is “traditional, reasonable, or likely to be developed.” With no traditional or reasonable collective licensing market for books in existence, the Authors Guild thus must argue that one is likely to develop.

The only such prospect the brief offers—one rich with irony—is the rejected Google Books settlement. In the Authors Guild’s optimistic phrasing, the settlement “shows how a collective management system might work to permit certain of the activities of Defendants in this case while providing compensation to copyright owners.” I agree that yes, the settlement showed how such a system might work. But the settlement is no longer even a possibility, not after Judge Chin’s holding that it “exceeds what the Court may permit under Rule 23.” Indeed, in a forthcoming article, I will be explaining why the settlement would have been unconstitutional, to boot. It is hard to see how a defunct and impermissible settlement provides a basis for anything, let alone a functioning licensing market.

Permit me a brief digression on Daniel Gervais’s expert report on collective copyright management systems. He offers a clear and cogent history of collective management in the United States and in Europe. But he also makes a claim that strikes me as utterly unsupportable:

I believe that if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.

This is plausible only if one ignores the immense transaction costs associated with locating all of the copyright owners for the books in the HathiTrust corpus. The costs are simply being pushed from the libraries to the copyright collective; they don’t go away. It might be economically feasible for a collecting society to come into being for a subset of popular books, or for a large slice of recent books, but it is not plausible that one will come into existence for all books. The only way that could happen—given current United States law on ownership, transfer, and licensing of copyright—is if Congress passes legislation establishing one. If it does, that wouldn’t be “the market” providing collective licensing—and in any event, it would not be appropriate for Judge Baer to consider what Congress might or might not hypothetically do in determining whether there is a genuine licensing market for books that the HathiTrust libraries should have gone to.

Perhaps more plausibly, the Authors Guild argues that HathiTrust’s uses undermine existing licensing markets, even where the licenses offered in those markets aren’t quite the same as those HathiTrust sought. This is at least a cognizable theory of harm. The two licensing markets the Authors Guild cites are full-text search (e.g. Amazon’s Search Inside) and non-consumptive research. The evidence here of harm is, however, underdeveloped. The Authors Guild hasn’t shown that full-text search is a paying license, only that the licenses are granted to drive book sales—so it needs some further evidence linking HathiTrust full-text search to lost book sales (e.g. via diversion of searchers). And as for non-consumptive research, all the Authors Guild can offer are a few statements from individual plaintiff T.J. Stiles that, e..g:

From what I’ve learned about it, non-consumptive research represents a potentially exciting field for academics and therefore an emerging licensing opportunity for authors at a time when revenues are decreasing. Indeed, it is my understanding that the Amended Settlement Agreement entered into by The Authors Guild and Google would have permitted the defendant libraries to engage in non-consumptive research activities using works such as Jesse James—but pursuant to a license that included a mechanism to compensate authors.

Again, there is no real evidence of harm here: just because Stiles would like to license non-consumptive research does not mean there is anyone who will pay him for that license. Indeed, one of the amicus briefs specifically argues that non-consumptive research is also non-infringing, which would mean there is no licensing market to be had.

The final theory of market harm proffered by the Authors Guild is security risks from the creation and maintenance of the HathiTrust corpus in a 466-terabyte database connected to campus networks and to Web servers. (The brief is cleverly drafted to leave the impression that the contents of the database are available on those networks and on the Web, but avoids actually saying so.) This, the Authors Guild argues, creates the potential for security breaches and resulting piracy. (Unfortunately, many of the factual details on which specific allegations of poor security depend have been redacted.)

But all of this is only potential harm. Even Benjamin Edelman’s expert witness declaration doesn’t identify a single actual leak of a book from the Google Books project resulting in the kind of widespread Internet piracy he warns of. Like Gervais’s, Edelman’s report deals largely in hypotheticals. The most ungrounded of them is his speculation (echoed in the Authors Guild’s brief) that

Seventh, when books are scanned by a smaller and less sophisticated provider, there is a particularly acute risk of book contents being accessed and redistributed. For one, less sophisticated organizations have a reduced capability to design, install, and maintain suitable web site, database, and related security systems as well as anti-reconstruction systems to secure books. Furthermore, less sophisticated organizations have a lesser ability to screen key staff to prevent data loss through rogue employees, and a lesser ability to configure security systems to exclude hackers. Thus, if Defendants’ conduct is found to be legal, and if other companies and organizations follow Defendants’ lead in scanning books, the risk that book contents will be accessed and redistributed becomes even greater.

The behavior of differently situated entities is not germane to the factual and legal questions posed by HathiTrust’s behavior. If others might take bad precautions, that doesn’t tell us anything about how good or bad HathiTrust’s are. If Judge Baer is inclined to find that HathiTrust is making a fair use because it has sufficient security but is concerned that other book-scanners will cut corners, he can say so. His opinion could state, explicitly, that HathiTrust’s security passes muster because <insert details here>, thus making the security requirements part of his holding.

On HathiTrust’s side, it’s easiest just to quote from the brief:

Plaintiffs admit they are unable to identify “any specific, quantifiable past harm, or any documents relating to any such past harm” resulting from the Libraries’ uses of their works. …

Plaintiffs have not produced in discovery:

  1. any business plans for licensing the digitization of books; or
  2. any plans for the use books for preservation and research purposes; or
  3. any analysis of their costs for licensing such markets or anticipated revenues; or
  4. any communications with entities that might collectively license these rights; or
  5. any analysis of the substantial limitations that such an entity would face in terms of the number of works it could foreseeably license.

Conclusion

On my read of the briefs:

  • The first factor depends on which characterization of the facts Judge Baer finds more convincing. I personally think HathiTrust’s holistic perspective is more persuasive, especially given the image-search cases.
  • The second factor can be—ahem—factored out by splitting the corpus into informational and expressive subcorpi.
  • The third factor follows the first.
  • The fourth factor depends on evidence of harm to licensing revenue, and I just don’t see that evidence in the filings.

The Authors Guild has done well in the previous skirmishes of this litigation, from the huge unforced error of the Orphan Works Project to Ed Rosenthal’s masterful performance at oral argument in May. But the motions for summary judgment are where the fair use battle will really be fought, and here, I think HathiTrust has made a significantly stronger case in this first round. It will be very interesting to see the oppositions next week.

Next time: the blind and the amici.

HathiTrust Summary Judgment Motions: Section 108


On Friday, the parties filed their motions for summary judgment in the HathiTrust case, along with thousands of pages of supporting documents. I’m still making my way through the filings. The heavy redactions make it easier: there’s one document consisting entirely of five pages of solid black, save only the cryptic document number UM004282. Even its title is redacted. But there are still piles of depositions and interrogatories to get through. It doesn’t help that the Public Index is being gradually nursed back to health from a bad malware infection, so I’ve been unable to post the documents there yet, either.

I thought, however, that I would summarize the arguments in the briefs themselves, to give readers a sense of how the case is developing. There are three briefs in: the Authors Guild (and other authors and groups) on the plaintiff side, and HathiTrust and the National Federation of the Blind (and individual blind students) on the defendant side. Interestingly, the three briefs take on slightly different issues. Today, I’ll discuss the prima facie case of infringement and Section 108; fair use will follow later in the week.

Infringement

The Authors Guild first presents the elements of copyright infringement: ownership of the copyright by the plaintiff and copying by the defendant. While the defendants have their own pending motions about whether the Authors Guild and other groups are entitled to sue on behalf of their members, they don’t otherwise contest the prima facie case of infringement. Nor would they. This case has never been about whether the copying happened; it’s always been about whether the copying that happened is legal.

What the defendants do contest, however, is the plaintiffs’ characterization of which conduct requires legal justification. The Authors Guild focuses on the digitization itself, on the distribution of digitized copies to multiple HathiTrust sites, and on the now-cancelled Orphan Works Project. The defendants admit that they engaged in all of these activities, of course, but they focus on the purposes to which the digitized copies are put: full-text search, preservation, and access for the print-disabled. It’s a little anomalous to have the defendants detailing more conduct than the plaintiffs, but it makes sense given the structure of the case. The plaintiffs are focusing on the mass copying; the defendants on the socially productive uses to which those copies can be put.

Section 108

The Authors Guild argues that HathiTrust does not qualify for Section 108’s copyright exemptions for libraries. This is a revamped version of its February motion for judgment on the pleadings, which argued that Section 108 couldn’t possibly apply to any of HathiTrust’s uses. The court hasn’t ruled on the first motion, leaving the Section 108 issue hanging over the case.

I’m trying to decide whether it’s strange that the defendants haven’t now argued that Section 108 does apply. At the judgment on the pleadings stage, the threshold is extremely high, because the facts of the case haven’t been developed yet. At the summary judgment stage, the threshold is lower, because there are more facts in evidence, and hence fewer uncertainties weighing on the question. Back in December, the defendants filed vigorous oppositions, arguing that Section 108 could apply to many particular uses. But most of the arguments they made were of the form, “Section 108 could_ apply to some uses of some books,” not of the form, “Section 108 does apply to these uses of these books.”

I was expecting to see this latter form of argument in the defendants’ summary judgment motions, but it isn’t there. There’s very little on Section 108 at all. I can think of three possible reasons why:

  • They’re waiting for their responses to the Authors Guild’s summary judgment motion. That doesn’t help them much, though, because Section 108 is an affirmative defense that the defendants ultimately need to put in play if they hope to prevail on. They can’t play defense forever; they have to go on offense on this one.
  • They’re waiting to make these arguments at trial, rather than on summary judgment. This is more plausible, but I can’t figure out what they’d be waiting for. They have evidence about specific books; that’s the kind of information that could have been presented as part of the documentation of the summary judgment motion.
  • They’re giving up on the Section 108 argument because they think it’s unlikely to work, and so are saving their pages for the fair use arguments. But this raises the question of why they were previously litigating the legal aspects of Section 108 so vigorously.

In any event, here are the key questions in the parties’ Section 108 arguments:

Commercial Advantage

Section 108(a) has a threshold condition that “the reproduction or distribution is made without any purpose of direct or indirect commercial advantage.” The Authors Guild argued previously that Google’s participation renders the entire HathiTrust project inherently commercial, because it gives the libraries the free use of “digital book conversion services, valued in the hundreds of millions of dollars” (MJP 13) and “provided enormous commercial and competitive benefits to Google.” (MSJ 12). The libraries challenged the first of these back in the spring, citing legislative history for the claim that “commercial advantage” means making money, not just saving it. The Authors Guild is now arguing primarily the second claim, that the cooperation serves Google’s commercial interests; I expect to see this position contested in HathiTrust’s response.

Replacement Copies

Section 108(c) permits the creation of three copies “solely for the purpose of replacement of a copy or phonorecord that is damaged, deteriorating, lost, or stolen” on two conditions: (1) the library “has, after a reasonable effort, determined that an unused replacement cannot be obtained at a fair price”” and (2) any digital copies aren’t “made available to the public in that format outside the premises of the library or archives.” Whether specific books are deteriorating is a question of fact; the libraries have yet to come forward and argue that any specific books were indeed in need of replacement. Until they do, this defense has a hypothetical quality; it’s not actually in contention and anything the court says about it would be dictum. (The same goes for the question of whether the libraries investigated the condition of the books: maybe they did for some, but until they say so with support in the record, the issue isn’t really in play.)

But there are also legal questions here with some significant implications. For one, the Authors Guild argued that the HathiTrust process makes at least ten copies of each book—images and OCR text held in each of five places: at Google, on two HathiTrust server farms, and on two HathiTrust tape backups. HathiTrust responded by claiming that every “technical digital copy of a work” should count against the limit, since viewing a book on a computer creates a copy in memory, an argument that strikes me as beside the point, as the HathiTrust copies are hardly “technical” in the same sense as transitory copies in memory. They’re intended to be stable and enduring: that’s the point of preservation. HathiTrust may be right that the creation of more than three copies “was dictated by the medium and standards for preserving works in digital form,” but that’s a tough argument to make in the face of a statute that says “three.”

And for another, the Authors Guild challenged HathiTrust member libraries’ farming of digital copies out to Google and HathiTrust HQ, arguing that Section 108(c) requires that “digital copies will not be distributed, and will stay in the physical library.” But the statute doesn’t say “distributed”; it says “made available to the public.” So unless the Authors Guild is prepared to argue that HathiTrust is “the public” — which so far it hasn’t clearly done — the statutory text is on HathiTrust’s side here.

Articles

Section 108(d) lets libraries make copies of an “article or other contribution to a copyrighted collection or periodical issue” for patrons. The Authors Guild gave an argument that Section 108(d) didn’t apply because the digitization was en masse rather than in response to user requests. HathiTrust does not appear to be challenging this argument because it doesn’t appear to be relying on Section 108(d) at all. Section 108(d) is about copies of parts, not copies of wholes.

Private Study, Scholarship, or Research

Section 108(e) is worth quoting at length:

(e) The rights of reproduction and distribution under this section apply to the entire work, or to a substantial part of it, made from the collection of a library or archives where the user makes his or her request or from that of another library or archives, if the library or archives has first determined, on the basis of a reasonable investigation, that a copy or phonorecord of the copyrighted work cannot be obtained at a fair price, if-

(1) the copy or phonorecord becomes the property of the user, and the library or archives has had no notice that the copy or phonorecord would be used for any purpose other than private study, scholarship, or research; …

The Authors Guild challenges the applicability of this section on three grounds, which are really the same. First, the digitization is not at the “request” of a user because it involves complete collections; second, HathiTrust’s digital copies are not “the property” of library users; and third, the libraries haven’t investigated the availability of the books. These are all just objections to the bulk scanning.

HathiTrust’s response is very interesting. It has not attempted to claim that Section 108(e) eo ipso applies to the bulk scanning: instead, it argues that the Orphan Works Program would be protected under it. Thus, in effect, HathiTrust needs some other defense for the bulk scanning, but once it has the copies, claims it could use them to help satisfy patron requests. (Of course, the Authors Guild disagrees.) But arguments have a way of folding in on themselves: the claim that Section 108(e) might protect downstream uses of the digital corpus, in turn, becomes an argument that could help justify its upstream creation, say under fair use.

Systematic Reproduction

Section 108(g) sets another threshold condition: Section 108 applies only to “the isolated and unrelated reproduction or distribution of a single copy or phonorecord of the same material on separate occasions” but not “the related or concerted reproduction or distribution of multiple copies or phonorecords of the same material.” The Authors Guild argues that this is precisely what’s happening with HathiTrust’s wholesale scanning. (It also makes an argument that the scanning is a prohibited “systematic reproduction” under Section 108(g)(2), but that language is qualified to apply only to Section 108(d) reproductions, and as noted above, HathiTrust doesn’t appear to be relying on Section 108(d).)

HathiTrust has two counterarguments here. One is that “isolated and unrelated” applies only to multiple copies of “the same material”—i.e., repeated copying of a single book. The Authors Guild replies by emphasizing the words “single copy” in the first quoted phrase, but HathiTrust’s emphasis on the words “same material” in the second quoted phrase is textually cleaner. I’d have to say the statutory text here is genuinely ambiguous, which leads us to legislative history and HathiTrust’s second argument: that the meaning of “multiple” and “systemic” is tied to the risk that extensive copying will serve as a substitute for library purchases. But that itself is a disputed issue—more on that next time.

Twenty-Year Sunset

Finally, Section 108(h) permits libraries to make copies of commercially unavailable works and distribute them to patrons in the final twenty years of their copyright terms. The Authors Guild argues that the Orphan Works Project falls out of Section 108(h)’s quite narrow scope. But since HathiTrust hasn’t (yet) argued that the Orphan Works Project, or any other particular uses of any particular books, are protected by Section 108(h), it’s again something of an abstract question.

Summary

Some of the Section 108 issues are not really before the court, because the defendants have yet to argue that Section 108 actually does apply to any identified activities. But even leaving that aside, I have trouble seeing how HathiTrust can make a pure Section 108 argument in its defense. Some of its copies might ultimately end up being protected, but if it has a defense that will win this lawsuit across the board, that defense is fair use, not Section 108. The statutory library privileges matter because of how they frame and inflect the fair use issue.