The Laboratorium
August 2012

In re Books: Speakers Announced, Registration Open


We’ve just put up the initial list of speakers for In re Books:

  • Jonathan Band, Policy Bandwidth
  • Caleb Crain, Author
  • Niva Elkin-Koren, University of Haifa
  • James Gleick, Author
  • Daniel Goldstein, Brown Goldstein Levy
  • Eric Hellman, Unglue.it
  • Roy Kaufman, Copyright Clearance Center
  • Ariel Katz, University of Toronto
  • Jessica Litman, University of Michigan
  • Lateef Mtima, Howard University
  • Valerie Small Navarro, ACLU of California
  • Aaron Perzanowski, Wayne State University
  • Matthew Sag, Loyola University of Chicago
  • Christopher Sagers, Cleveland State University
  • Pamela Samuelson, University of California at Berkeley
  • Jule Sigall, Microsoft
  • John Thompson, University of Cambridge
  • Elizabeth Townsend-Gard, Tulane University
  • Doron Weber, Sloan Foundation
  • Jessamyn West, Librarian.net
  • Eric Zohn, William Morris Endeavor

We’ll be adding even more in the weeks to come. The program is also online so you can see what we’ll be discussing, and registration is now open for the low low price of $100 or less. Come join us October 26 and 27 in New York. It’s going to be great.

Why Johnny Can’t Stream


My latest essay for Ars Technica, Why Johnny Can’t Stream: How Online Copyright Went Insane is now online. From my perspective, it’s an attempt to tie together my blogging on cases like Aereo, Zediva, and ReDigi and to illustrate what they have in common. From a legal perspective, it’s the story of how of the public performance right has gradually made less and less sense over the last few years. And from a business perspective, it’s about why startups are buying thousands of tiny antennas, building the world’s longest video cable, and devising increasingly outlandish technologies. Some excerpts:

Suppose I could offer you a choice of two technologies for watching TV online. Behind Door Number One sits a free-to-watch service that uses off-the-shelf technology and that buffers just enough of each show to put the live stream on the Internet. Behind Door Number Two lies a subscription service that requires custom-designed hardware and makes dozens of copies of each show. Which sounds easier to build—and to use? More importantly, which is more likely to be legal?

If you went with Door Number One, then you are a sane person, untainted by the depravity of modern copyright law. But you are also wrong. The company behind Door Number One, iCraveTV, was enjoined out of existence a decade ago. The company behind Door Number Two, Aereo, just survived its first round in court and is still going strong. …

The post-Cablevision cases are almost comically formalistic about technical details. Instead of looking at the front-end user experience, they focus on the back-end hardware and software. Sooner or later, someone is going to argue to a court that it makes a difference for copyright purposes whether a video stream is decoded by the main CPU or by a dedicated graphics card—or some other distinction equally remote from anything the typical viewer thinks about when trying to catch up on last week’s episode of Breaking Bad.

This technological formalism has real costs and real benefits for all concerned. On the upside, Lawful Good technologists and investors need bright-line guidance. Imagine being a cloud computing vendor, watching the file locker litigation and worrying that one judge could scuttle your entire business model. Or worse, imagine being a cloud computing customer facing the risk that one judge could consign your files to Davy Jones’s locker.

Give it a read!

Google’s Algorithmic Ideology


This is a scholarly outtake: a passage that no longer fits with the article I wrote it for.

Google was founded by a pair of highly intelligent, Montessori-educated, profoundly optimistic graduate students in computer science who happened also to be extraordinarily lucky in having the right idea at the right time while surrounded by the right people. The company they built bears these personality traits deep in its distributed organizational brain; they are reflected in its hiring decisions, corporate slogans, management processes, technical designs, and even its aesthetics.

Algorithms are the essence of the Google weltanschauung. The company’s success is traceable directly to PageRank, one of the most elegant algorithmic discoveries of the last generation. Five algorithmic values pervade almost everything the company does: automation, scalability, data, information maximalism, and arrogance. They constitute Google’s ideology: the system of ideas that animate its decisions, large and small.

Automation: An algorithm is perfectly consistent, free of human biases and mistakes. Google has long disdained manual curation and relied instead on algorithms to identify interesting, important, or dangerous items. The company offers support forums for users to discuss issues with each other, but a regular theme in websites’ Google horror stories is the difficulty of bringing an issue to Google’s attention and getting a meaningful response. Even its hiring decisions have depended on “a series of formulas created by Google’s mathematicians that calculate a score — from zero to 100 — meant to predict how well a person will fit into its chaotic and competitive culture.” Driverless cars are an extreme example of the trend.

Scalability Unsurprisingly for a company whose core product requires making a complete copy of the entire web on a frequent basis, Google is obsessed with scalability. The company does nothing unless it works not just for a few hundred data points but for millions, billions, trillions, or more. It has pioneered new techniques for building massive high-reliability data centers that use of hundreds of thousands of computers. It disdains incremental fixes, preferring to solve the general case whenever possible. This preference for scalability is another reason it is so notoriously difficult to reach actual Google employees for customer service issues—humans don’t scale.

Data: Google is profoundly data-driven. Its access to mountains of data about webpages and searches has given it a strategic edge in understanding the problems of search. It has repeatedly demonstrated the “unreasonable effectiveness of data”: the reason Google’s translation tools are so good is not because the company is dramatically better than generations of researchers at modeling natural language, but rather because it has access to the world’s largest corpus of actual language usage data. Proposed changes to search algorithms are heavily informed by empirical data. At times, this preference for the quantifiable tilts over into absurdity: it picked a toolbar color by testing 41 different shades of blue and seeing which one drew the most user clicks.

Information Maximalism: Google believes that the world is better off when people have access to more information. This is not a belief in overloading people with information, quite the opposite. Google Search and other products help users isolate the information they want and need. But this approach works best when they have as much information as possible to select from. Google frequently promotes the conversion of information to accessible digital forms—books, art, map data, legal documents, and so on—as a way of making search more useful. The obvious policy consequence of this vacuum-cleaner attitude towards information is that Google tends to fall on the “information wants to be free” side of debates over Internet filtering, copyright policy, and similar issues—and also why it has had such persistent privacy problems.

Arrogance: Larry and Sergey are very smart computer scientists who became ludicrously wealthy by hiring other very smart computer scientists. That initial experience was imprinted on the company they founded: it has the confidence to believe that every problem “from book digitization to freedom of expression” can be solved by “talented computer scientists and engineers” bearing “scientific, heavily quantitative methods.” It has rooms full of smart people who are used to thinking of themselves as the smartest people in rooms full of smart people. The company’s two strategic quagmires—Android and Google+—both reflect the belief that its raw brainpower and allegedly unique corporate culture amount to a Green Lantern Corps power ring, capable of entering any field and dominating any market. But while an algorithm can be proven correct—companies cannot.

Google Books: Who Is the Class?


My previous post about the Palmer Kane case has gotten me wondering how the Google Books class deals with copyright owners similarly situated to Palmer Kane. The now-rejected settlement had an extensive class definition with numerous exceptions and defined terms. But the class Judge Chin certified is much simpler:

All persons residing in the United States who hold a United States copyright interest in one or more Books reproduced by Google as part of its Library Project, who are either (a) natural persons who are authors of such Books or (b) natural persons, family trusts or sole proprietorships who are heirs, successors in interest or assigns of such authors. “Books” means each full-length book published in the United States in the English language and registered with the United States Copyright Office within three months after its first publication. Excluded from the Class are the directors, officers and employees of Google; personnel of the departments, agencies and instrumentalities of the United States Government; and Court personnel;

I’m curious about the language I’ve highlighted. I assume that the intention here is to restrict the class to people who are either listed as authors on the registration certificate or can show a chain of title from the author. Does this language exclude the authors of what would have been called Inserts under the settlement? What about the creators of visual material licensed for inclusion in a book, as in Palmer Kane? Among other things, I’m trying to make sure this class doesn’t automatically sweep in the visual artists who have their own pending lawsuit against Google. I don’t think it does, but I confess that I’m not entirely sure. I invite your thoughts about the class definition and what it covers.

Google Books: A Recent Case on Copyright Licensing and Class Certification


Eric Goldman pointed me to an interesting recent decision from the Southern District of New York—where the Google Books suits are being heard—about class certification and copyright licensing, Palmer Kane LLC v. Scholastic Corp.. It may well have some bearing on Google’s pending appeal, although, for reasons I’ll explain, I think it’s not exactly on point.

In a nutshell, Scholastic publishes an extensive series of books, workbooks, videos, and software called READ 180. It’s designed to help students at all levels from elementary school through high school improve their reading skills, although I have to say that the Scholastic site, which features very few words and glossy pictures of graph-heavy “Dashboards” on iPad-like computers, doesn’t exactly inspire confidence. READ 180 started in 1999, was updated in 2005 with an “Enterprise” edition, and again in 2011 with a “Next Generation” edition. (I’m curious about the trademark-law backstory here.)

Since READ 180 is a reading-focused curriculum, it naturally follows that Scholastic licenses thousands of images for it. It works with at least eight photo houses, and with numerous individual rightsholders. The invoices for the images set out a variety of payment terms, permissible print runs, start and end dates, reuse fees, product line restrictions, and so on.

The plaintiff owns the copyrights for three photographs taken by Gabe Palmer, with the stock-ariffic titles of “Paramedics,” “Troubled Students,” and “Speeding Ambulance,” which it believes were used without sufficient permission in READ 180. It sought to represent a class of similarly situated image copyright owners, with respect to two allegedly defective licensing practices:

The “overrun” class would include rights holders whose photographic works were used by Defendant in any READ 180 publication in excess of the licensed print run. The proposed “unauthorized use” class would include rights holders whose works were published in a READ 180 component without Scholastic having obtained the requisite license before the printing date.

Held, on the record provided, Scholastic’s licenses were too diverse to permit class treatment. Scholastic argued, and the court agreed, that to determine whether any given image was infringed would require an individual inquiry into not just the language but the surrounding circumstances of the license. Scholastic had extensive negotiations with the eight licensing houses, which resulted in Preferred Vendor Agreements that modified the terms of the invoices. Meanwhile, the scope of the licenses those houses offered were themselves shaped by the dealings and individual agreements between rightsholders and licensing house. Taking all of this together, the court concluded that the case for infringement was not susceptible to the kind of “generalized proof” that a class action requires.

There is an obvious parallel to the Google Books case, where Google has been arguing that the diversity of book licensing practices renders class treatment inappropriate. If I were Google, I would be planning to cite Palmer Kane in my brief on appeal to the Second Circuit. But I would also not push the analogy too far. In Palmer Kane, the licenses were utterly central to the lawsuit, because they defined what was and was not infringing. In the Google Books case, however, the licenses are more peripheral; the core of the case involves fair use. The licenses affect the weighing of a few of the fair use factors, and they can affect any individual plaintiff’s membership in the class, but they don’t preclude the possibility of a ruling on the merits of infringement-by-scanning one way or the other. Palmer Kane is interesting and relevant but not determinative.

Now for the O’Henry-esque twist. Scholastic’s lead attorney in Palmer Kane was Edward Rosenthal from Frankfurt Kurnit. Yes, the same Edward Rosenthal who is the Authors Guild’s lead attorney in its lawsuit against HathiTrust. Thus, in one of its two suits over Google Books, the Authors Guild is represented by the attorney who is most responsible for creating a piece of law that could complicate its other suit over Google Books. Hawkward.

Google Books: Even Friends of the Court Have Enemies


Remember how the American Library Association (along with two other library groups, but call them collectively “ALA” for short) and Electronic Frontier Foundation filed an amicus brief in the HathiTrust case, and a group of “Digital Humanities and Law Scholars” filed one too? Well, they also asked to file amicus briefs (ALA et al., scholars) in the main case against Google. But this time, the Authors Guild and the other plaintiffs opposed the filings, arguing that the filers were “friends of Google” rather than “friends of the court.”

The move is perplexing, on a number of levels. For one thing, the Authors Guild allowed nearly identical amicus briefs to be filed in the HathiTrust case. I can understand that different lawyers might reach different conclusions in different cases, but I would have thought that the Authors Guild itself could at least make its two sets of lawyers talk to each other and reach a common decision. For another thing, the law here is quite clear. District judges have broad discretion either to accept amicus briefs or to reject them. Opposing the filings more or less requires that Judge Chin will have to read the briefs in order to rule on the motion to strike the filings. At the end of the day, he’ll listen to the briefs if he thinks they’re persuasive, and ignore them if he doesn’t. Opposing the filing just comes across as petulance, if you ask me.

The one point I think the opposition makes effectively is that amicus briefs can be a subterfuge to put factual claims before a court without the opportunity for a well-developed adversarial presentation. (At the Supreme Court level, advocacy groups do this all the time to try and supplement the official record.) But on the whole, I find the ALA and EFF’s reply to be persuasive and to the point. The opposition, overall, is a litigation tactic for the sake of tactics; I don’t see how it helps the plaintiffs either substantively or strategically.

UPDATE: The Digital Humanities and Law Scholars have filed their own brief reply.

UPDATE: Judge Chin has allowed the filings and given the plaintiffs the opportunity to respond.

Google Books: The Appeal Is On


In a brief order filed today, the Second Circuit agreed to hear Google’s appeal of class certification immediately. In an ironic twist, Judge Chin was randomly assigned to the three-judge panel; unsurprisingly, he recused himself. The order means that Google’s appeal of class certification will proceed in parallel with Judge Chin’s consideration of fair use. The decision strikes me as unsurprising, given the case’s high profile.

CPNI Fail


AT&T just sent me an email that violates federal law.

I’m an AT&T wireless customer. This means that AT&T has access to lots of information about how much I use my phone, what numbers I call, and how much they charge me for it. This information is called “customer proprietary network information” (or “CPNI”) in telecommunications law, and the FCC has issued rules to protect users’ privacy in it.

AT&T would like to charge me even more by selling me more services. It thinks that it can make better-targeted offers by looking through my CPNI. The FCC rules allow AT&T to do this. They don’t even require AT&T to get my affirmative consent first. All they require is that AT&T give me notice and the ability to opt out. So AT&T sent me an email yesterday with the required notice. It explains what CPNI is in stilted and legalistic prose, gives instructions for opting out, and says that I have 33 days to do so.

There’s just one problem: the opt-out doesn’t work. Here’s what the email says:

If at any time you would prefer that AT&T not use your CPNI to offer you additional products and services, you may:

Click here to submit your request electronically
Call 1.800.315.8303 24 hours a day, 7 days a week and follow the prompts
Call 1.800.288.2020 and speak to a service representative

“Here” is a hyperlink. Clicking on it loads the webpage

http://clicks.att.com/OCT/eTrac?EMAIL_ID=1547226065&finalURL=http://www.att.com/ecpnioptout

This is a simple click-tracker. The AT&T website is designed to record the fact that I clicked on a link in this email, then redirect me to the finalURL at

http://www.att.com/ecpnioptout

And indeed that webpage is a form to opt out of having my CPNI used for marketing. When I click on the link in the email, though, I don’t end up at the form. I end up at the main AT&T webpage, i.e.:

https://www.att.com/

In other words, AT&T’s click-tracker is broken. It doesn’t properly redirect users to the the intended webpage. But this means that the online opt-out is broken: clicking “here” does nothing. I’m sure that some AT&T subscribers clicked that link to opt out and assumed they were done. This is a violation of the FCC CPNI rules. They require that the notice “must advise the customer of the precise steps the customer must take in order to grant or deny access to CPNI.” Telling a customer to click on a non-working link is a clear failure to advise the customer of the precise steps required.

(Sidenote: The CPNI rules also say, “Carriers must allow customers to reply directly to e-mails containing CPNI notices in order to opt-out.” But the email from AT&T says, “PLEASE DO NOT REPLY TO THIS MESSAGE All replies are automatically deleted. For questions regarding this message, refer to the contact information listed above” Make that two violations in one email.)

This would have been easy to get right. AT&T didn’t have to use a click-tracker; it could have embedded the final URL in the message rather than using a redirection. The reason to track clicks is that it allows AT&T to collect more information about its subscribers: which emails do they read and respond to? In other words, the corporate instinct to gather data, hoard it, and use it for marketing—the same instinct that led AT&T to want to use my CPNI in the first place—led directly to the violation. AT&T may say that “[t]he protection of our customers’ privacy is of utmost importance to the employees and management of the AT&T family of companies” but its actions show just what it thinks of privacy. It can’t stop grasping for information, even when trying to comply with a rule that limits the information it is allowed to use.