Stanford, Google, Privacy, Money, Ethics: A Correction


I am quoted in this ProPublica article about “Stanford’s promise not to use Google money for privacy research.”

“It’s such an etiquette breach, it tells you something is really sensitive here,” said James Grimmelmann, a University of Maryland law professor who specializes in Internet law and online privacy. “It’s fairly unusual and kind of glaring to have that kind of a condition.” …

For instance, some of the non-privacy research at Stanford’s Center for Internet and Society could be more related to privacy than they appear, Grimmelmann said.

Take copyright. A study on the increasing popularity of e-books could lead to the topic of e-book piracy, which could lead to the idea of publishers requiring readers to log in, a practice that could make users’ reading habits much easier to track – a clear-cut privacy issue.

“Some of the best copyright scholarship of recent decades … couldn’t have been carried out at the Stanford CIS under the terms of the grant you described to me,” Grimmelmann said. “So a commandment that ‘Thou Shalt Not Study X’ also interferes with the study of the rest of the alphabet.”

I have now had a chance to read the legal filing in which, according to the article, “Stanford University recently declared that it will not use money from Google to fund privacy research at its Center for Internet and Society.” I stand by my statements that such a pledge would be both unusual and problematic for academic integrity. But reading the underlying “promise” in context, I now do not believe Stanford made such a pledge.

Specifically, the filing is the Stanford CIS’s application for distribution of “cy pres” funds from a class-action settlement in a privacy case against Google. It is common in such cases for a court to give settlement funds to charitable or public-interest organizations when it would be difficult, impossible, or wasteful to give money directly to class members. The CIS applied for the funding to support privacy- research on mobile privacy, privacy-enhancing technologies, analyzing state privacy laws, and educational speakers on privacy.

The crucial passage occurs in the application’s discussion of a potential conflict of interest in using Google settlement funds when the CIS already receives funding from Google.

Per Stanford University policy, all donors to the Center agree to give their funds as unrestricted gifts, for which there is no contractual agreement and no promised products, results, or deliverables. Stanford has strict guidelines for maintaining its academic autonomy and research integrity. CIS complies with all these guidelines, including the Conflicts of Commitment and Interest section of the Stanford Research Policy Handbook <http://doresearch.stanford.edu/policies/research-policy- handbook/conflicts-commitment-and-interest>. Stanford policies provide explicit protection against sponsors who might seek to direct research outcomes or limit the publication of research.

Since 2013, Google funding is specifically designated not be used for CIS’s privacy work. CIS’s academic independence is illustrated by the following work by Privacy Director Aleecia M. McDonald and CIS Junior Affiliate Scholar Jonathan Mayer, which may not accord with Google’s corporate interests: [list of projects]

The phrase “specifically designated not be used” is ambiguous and unfortunate. But in a blog post, CIS Civil Liberties Director Jennifer Granick states, “[T]the designation to which we were referring is an internal SLS/CIS budgeting matter, not a policy change, and we very well may decide to ask the company for a gift for privacy research in the future. But in 2013, we had other funding sources for our consumer privacy work, and so we asked for, got, and designated Google money to be used for different projects.” This sounds like a standard academic grant: a request to support specific work, which takes the form of an unrestricted gift, and which is not accompanied by a promise that the work will not touch on a particular subject.

It would have been better to use different language in the filing. It would have been better still not to have applied for the cy pres funds. But I am not convinced that the CIS made a “promise” of the particularly problematic sort I understand was at stake: a pledge to prevent funds from being used in connection with research on a specific subject. I am sorry that I commented based on a reporter’s description of the filing rather than asking to see it myself.

Facebook and OkCupid’s Experiments Were Illegal


You may remember Facebook’s experiment with emotionally manipulating its users by manipulating their News Feeds. And OkCupid’s experiment with lying to users about their compatibility with each other. And the withering criticism directed at both companies. (I maintain an archive of news and commentary related to the studies.)

At the time, my colleague Leslie Meltzer Henry and I wrote letters to the Proceedings of the National Academy of Sciences, to the federal Office for Human Research Protections, and to the Federal Trade Commission. Our letters detailed the serious ethical problems with the Facebook study: Facebook obtained neither the informed consent of participants nor approval from an IRB.

Today, we’re back, because what Facebook and OkCupid did was illegal as well as unethical. Maryland’s research ethics law makes informed consent and IRB review mandatory for all research on people, even when carried out by private companies. As we explain in a letter to Maryland Attorney General Doug Gansler Facebook and OkCupid broke Maryland law by conducting experiments on users without informed consent or IRB review. We ask Attorney General Gansler to use his powers to compel the companies to stop experimenting on users until they come into compliance with the Common Rule.

Another provision of the Maryland law requires all IRBs to make their minutes available for public inspection. In July, Leslie and I wrote to Facebook and to OkCupid requesting to see their IRBs’ minutes, as is our right under the law. Facebook responded by conceding that it conducts “research” on its users, but refused to accept that it had any obligations under the law. OkCupid never responded at all. Since complying with a request for IRB minutes would be straightforward at any institution with an IRB, the most natural interpretation is that neither of them has an IRB—an open-and-shut violation of the law.

I have written an essay on Medium—”Illegal, Immoral, and Mood-Altering“—discussing in more detail the Maryland law and Facebook and OkCupid’s badly deficient responses to it. I hope you will read it, along with our letter to Attorney General Gansler, and join us in calling on him to hold these companies to account for their unethical and illegal treatment of users.

U2 4 U


(An essay in tweets)

Say what you will about U2 4 U, it was the most Steve Jobs-ian stunt Apple has pulled in years: heartfelt, egomaniacal, and grandiose.

Personally, I wouldn’t have minded having the U2 album show up in my iTunes if it had actually, y’know, shown up.

After various glitches, i ended up with multiple copies of some songs and none of others. So not quite the best advertisement for Apple Pay.

The album itself is no worse than U2’s other recent work, but no better either.

All that said, there is something to the point that Apple crossed some kind of line by putting the album on people’s devices without consent.

U2 4 U gave users an uncanny glimpse of the power that lies behind cloud technology. It was a Goffmanian gaffe.

In this respect, it’s much like Amazon’s deleting 1984 from Kindles, as @ScottMadin observes: a reminder that someone has such power.

People’s trust in the cloud — in technology — is based on a trust that it will work predictably and at their direction.

So when Apple drops U2 on your iPhone, it shatters the illusion that your iPhone just works on its own, which is deeply unsettling.

Apple of all companies — having invested so much in convincing users that their devices Just Work — should have been alert to the dangers.

It was basically harmless — but in the same way that skidding and then regaining control of your car is “harmless.” You’re still rattled.

The general principle here isn’t quite consent, because to talk about when your consent is needed, we need to know what counts as “yours.”

To take a first cut at it: devices are yours, so are things you paid for access to, and things you make, and collections you curate.

Apple’s flub was the same as Amazon’s with 1984, and Twitter’s with tweet injection. This is what’s wrong with malware, and Yahoo shutdowns.

The Face of Fear


It is easy to see why the citizens of Ferguson are so scared of the police. What is stranger is that the police are so scared of the citizens.

The military gear and the military tactics on daily display in Ferguson are the outward signs of inward terror. Police who feel secure about their safety don’t strap on body armor. You only stare down the barrel of a rifle at someone you think might kill you without warning. The police see the protests as an existential threat: something that will destroy their world unless it is overwhelmed by any means necessary.

The obvious absurdity of the fear doesn’t make it any less real. The reality of the fear doesn’t make it any less absurd.

This is what cowardice looks like, physical and moral.

Internet Law: Cases and Problems Version 4.0


Version 4.0 of Internet Law: Cases and Problems is now available. This is the 2014 update of my casebook, and it has been a busy year. I produced a special supplemental chapter on the NSA and the Fourth Amendment in December, and it was out of date within a week. The new edition has over twenty new cases and other principal materials and dozens of new questions and problems. Here is a partial list of what’s new:

  • A technical primer on cryptography
  • Coverage of venue in criminal cases, featuring U.S.. v. Auernheimer
  • An excerpt from danah boyd’s It’s Complicated discussing the four affordances of speech in social media
  • United States v. Petrovic on revenge porn
  • Jones v. Dirty World on the (non)liability of websites for user-posted content
  • Heavily revamped Fourth Amendment coverage, now introduced by the Supreme Court’s decision in Riley v. California (cell phone searches) and with a note on U.S. v. Jones (the mosaic theory and GPS tracking)
  • Ehling v. Monmouth-Ocean Hospital on applying the Stored Communications act to Facebook posts
  • Coverage of the pen register statute
  • 29 pages of NSA coverage, featuring discussion of the NSA’s mission, the law and policy of national security wiretapping, the Section 215 telephone metadata program, and Fourth Amendment challenges to national security metadata collection
  • In re Snapchat, a cutting-edge FTC privacy enforcement action (with pictures!)
  • The CJEU Google Spain decision on the so-called “right to be forgotten”
  • A concise set of materials on Bitcoin, with a technical primer and interpretive guidance documents from FinCEN and the IRS
  • A short excerpt from ABC v. Aereo on the public performance right in copyright
  • An all-new chapter on software patents, headlined by the Supreme Court’s decision in Alice Corp. v. CLS Bank, with cases raising issues of obviousness, claim construction, patent assertion entities, standard-essential patents, and injunctions
  • Reworked materials on network neutrality, with added excerpts from Chairman Powell’s “four freedoms” speech, the Madison River consent order, Comcast v. FCC, and Verizon v. FCC, along with a note on interconnection issues such as the Netflix-Comcast dispute

I have also gone over every question in the book, tightening up wording, removing redundancies, and focusing the inquiries on what really matters. As before, the book is available through Semaphore Press as a pay-what-you want DRM-free PDF download at a suggested price of $30. The price has stayed the same, but compared with the first edition you get now 55% more casebook for your dollar. The book is still targeted at law students but written, I hope, to be broadly interesting.

Download it while it’s hot!

Three Letters About the Facebook Study


My colleague Leslie Meltzer Henry and I have sent letters asking three institutions—the Proceedings of the National Academy of Sciences, the federal Office for Human Research Protections, and the Federal Trade Commission—to investigate the Facebook emotional manipulation study. We wrote three letters, rather than one, because responsibility for the study was diffused across PNAS, Cornell, and Facebook, and it is important that each of them be held accountable for its role in the research. The letters overlap, but each has a different focus.

  • Our letter to PNAS deals with the journal’s commitment to publish articles on human subjects research only when participants gave informed consent and an IRB reviewed the substance of the research. We explain why emotional manipulation study met neither of those conditions, and why the only appropriate response by PNAS is to retract the article.
  • Our letter to OHRP deals with the Cornell IRB’s flawed reasoning in treating the emotional manipulation study as research conducted independently by Facebook. We unpack the conflicting statements given to justify the study, and show that none of them stands up to close scrutiny.
  • Our letter to the FTC deals with the heightened concerns that arise when consumers are subject to active manipulation and not just passive surveillance. We explain why conducting psychological experiments on consumers without informed consent or oversight can be a deceptive and unfair trade practice.

Our letters deal with cleaning up the mistakes of the past. But they also look to the future. The Facebook emotional manipulation study offers an opportunity to put corporate human subjects research on a firmer ethical footing, one in which individuals given meaningful informed consent and in which there is meaningful oversight. We invite PNAS, OHRP, and the FTC to take leading roles in establishing appropriate ethical rules for research in an age of big data and constant experiments.

UPDATE, July 17, 2014, 1:30 PM: I am reliably informed that Cornell has “unchecked the box”; its most recent Federalwide Work Agreement now commits to apply the Common Rule only to federally funded research, not to all research undertaken at Cornell. (I made the mistake of relying on the version of its FWA that the Cornell IRB posted on its own website; I regret the error.) This affects the issue of the OHRP’s jurisdiction, but not the soundness of the Cornell IRB’s reasoning, which rested on the activities of Cornell affiliates rather than on the source of funding.

UPDATE, July 24, 2014, 2:00 PM: The letter to the FTC overstates the effects of the Bakshy et al. link-removal study when it describes the study as making some links “effectively unshareable.” Links were removed from News Feeds on a per-user basis, so removed links were still seen by other users.

Parsing The Facebook Study’s Authorship and Review


I have been thinking a lot about the mechanics of how the Facebook emotional manipulation study was conducted, reviewed, and accepted for publication. I have found it helpful to gather in one place all of the various claims about who did what and what forms of review it received. I have bolded the relevant language.

What did the authors do?

PNAS authorship policy:

Authorship must be limited to those who have contributed substantially to the work. …

All collaborators share some degree of responsibility for any paper they coauthor. Some coauthors have responsibility for the entire paper as an accurate, verifiable report of the research. These include coauthors who are accountable for the integrity of the data reported in the paper, carry out the analysis, write the manuscript, present major findings at conferences, or provide scientific leadership to junior colleagues. Coauthors who make specific, limited contributions to a paper are responsible for their contributions but may have only limited responsibility for other results. While not all coauthors may be familiar with all aspects of the research presented in their paper, all collaborators should have in place an appropriate process for reviewing the accuracy of the reported results. Authors must indicate their specific contributions to the published work. This information will be published as a footnote to the paper. Examples of designations include:

  • Designed research
  • Performed research
  • Contributed new reagents or analytic tools
  • Analyzed data
  • Wrote the paper

An author may list more than one contribution, and more than one author may have contributed to the same aspect of the work.

From the paper:

Author contributions: A.D.I.K., J.E.G., and J.T.H. designed research; A.D.I.K. performed research; A.D.I.K. analyzed data; and A.D.I.K., J.E.G., and J.T.H. wrote the paper.

Cornell press release:

… According to a new study by social scientists at Cornell, the University of California, San Francisco (UCSF), and Facebook, emotions can spread among users of online social networks.

The researchers reduced the amount of either positive or negative stories that appeared in the news feed of 689,003 randomly selected Facebook users, and found that the so-called “emotional contagion” effect worked both ways.

“People who had positive content experimentally reduced on their Facebook news feed, for one week, used more negative words in their status updates,” reports Jeff Hancock, professor of communication at Cornell’s College of Agriculture and Life Sciences and co-director of its Social Media Lab. …

Cornell statement

Cornell University Professor of Communication and Information Science Jeffrey Hancock and Jamie Guillory, a Cornell doctoral student at the time (now at University of California San Francisco) analyzed results from previously conducted research by Facebook into emotional contagion among its users. Professor Hancock and Dr. Guillory did not participate in data collection and did not have access to user data. Their work was limited to initial discussions, analyzing the research results and working with colleagues from Facebook to prepare the peer-reviewed paper “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks,” published online June 2 in Proceedings of the National Academy of Science-Social Science.

Because the research was conducted independently by Facebook and Professor Hancock had access only to results – and not to any data at any time – Cornell University’s Institutional Review Board concluded that he was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required.

Adam Kramer’s statement for Facebook:

OK so. A lot of people have asked me about my and Jamie and Jeff’s recent study published in PNAS, and I wanted to give a brief public explanation. …

Regarding methodology, our research sought to investigate the above claim by very minimally deprioritizing a small percentage of content in News Feed (based on whether there was an emotional word in the post) for a group of people (about 0.04% of users, or 1 in 2500) for a short period (one week, in early 2012). … And we found the exact opposite to what was then the conventional wisdom: Seeing a certain kind of emotion (positive) encourages it rather than suppresses is.

What did the IRB do?

PNAS IRB review policy:

Research involving Human and Animal Participants and Clinical Trials must have been approved by the author’s institutional review board. … Authors must include in the Methods section a brief statement identifying the institutional and/or licensing committee approving the experiments. For experiments involving human participants, authors must also include a statement confirming that informed consent was obtained from all participants. All experiments must have been conducted according to the principles expressed in the Declaration of Helsinki.

Susan Fiske’s email to Matt Pearce:

I was concerned about this ethical issue as well, but the authors indicated that their university IRB had approved the study, on the grounds that Facebook filters user news feeds all the time, per the user agreement. Thus, it fits everyday experiences for users, even if they do not often consider the nature of Facebook’s systematic interventions. The Cornell IRB considered it a pre-existing dataset because Facebook continually creates these interventions, as allowed by the user agreement.

Having chaired an IRB for a decade and having written on human subjects research ethics, I judged that PNAS should not second-guess the relevant IRB.

I regret not insisting that the authors insert their IRB approval in the body of the paper, but we did check that they had it.

Fiske’s email to Adrienne LaFrance:

Their revision letter said they had Cornell IRB approval as a “pre-existing dataset” presumably from FB, who seems to have reviewed it as well in some unspecified way. (I know University regulations for human subjects, but not FB’s.) So maybe both are true.

Cornell’s statement (again):

Because the research was conducted independently by Facebook and Professor Hancock had access only to results – and not to any data at any time – Cornell University’s Institutional Review Board concluded that he was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required.

Kramer’s statement (again):

While we’ve always considered what research we do carefully, we (not just me, several other researchers at Facebook) have been working on improving our internal review practices. The experiment in question was run in early 2012, and we have come a long way since then. Those review practices will also incorporate what we’ve learned from the reaction to this paper.

The Facebook Emotional Manipulation Study: Sources


This post rolls up all of the major primary sources for the Facebook emotional manipulation study, along with selected news and commentary.


Paper:

  • “Experimental evidence of massive-scale emotional contagion through social networks” as PDF and as HTML (received Oct. 23, 2013, approved March 25, 2014, publication date June 17, 2014)

Authors:

Cornell:

UCSF: (Guillory became affiliated with UCSF only after the study was conducted)

*Human subjects policy

Facebook:

PNAS:

Common Rule:

Previous Facebook studies:

Journalism:

Commentary:

Misc.:

OK Cupid experiments:

Interviews with Christian Rudder about OKCupid experiments:

Ashley Madison study:

Facebook’s Modified Research Policy:

As Flies to Wanton Boys


Most recent update: 9:05 PM, Monday June 30

If you were feeling glum in January 2012, it might not have been you. Facebook ran an experiment on 689,003 users to see if it could manipulate their emotions. One experimental group had stories with positive words like “love” and “nice” filtered out of their News Feeds; another experimental group had stories with negative words like “hurt” and “nasty” filtered out. And indeed, people who saw fewer positive posts created fewer of their own. Facebook made them sad for a psych experiment.

I first saw the story on Facebook, where a friend picked it up from the A.V. Club, which got it from Animal, which got it from the New Scientist, which reported directly on the paper. It’s exploding across the Internet today (e.g. MetaFilter), and seems to be generating two kinds of reactions: outrage and shrugs. I tend more towards anger; let me explain why.

Facebook users didn’t give informed consent: The study says:

[The study] was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.

The standard of consent for terms of service is low. But that “consent” is a legal fiction, designed to facilitate online interactions. (See Nancy Kim and Margaret Jane Radin’s books for more.) It’s very different from informed consent, the ethical and legal standard for human subjects research (HSR). The Federal Policy for the Protection of Human Subjects, a/k/a the Common Rule, requires that informed consent include:

(1) A statement that the study involves research, an explanation of the purposes of the research and the expected duration of the subject’s participation, a description of the procedures to be followed, and identification of any procedures which are experimental;

(2) A description of any reasonably foreseeable risks or discomforts to the subject; …

(7) An explanation of whom to contact for answers to pertinent questions about the research and research subjects’ rights, and whom to contact in the event of a research-related injury to the subject;

(8) A statement that participation is voluntary, refusal to participate will involve no penalty or loss of benefits to which the subject is otherwise entitled, and the subject may discontinue participation at any time without penalty or loss of benefits to which the subject is otherwise entitled.

Facebook’s actual Data Use Policy contains none of these, only general statements that “we may use the information we receive about you … for internal operations, including troubleshooting, data analysis, testing, research and service improvement.” and “We give your information to the people and companies that help us provide, understand and improve the services we offer. For example, we may use outside vendors to … conduct and publish research.” Neither of these comes close to a “description of the procedures to be followed” or a “description of any reasonably foreseeable risks or discomforts,” and the Data Use Policy doesn’t even attempt to offer a contact for questions or an opt-out.

Federal law requires informed consent: To be sure, the Common Rule generally only applies to federally funded research, and Facebook is a private company. But that’s not the end of the story. The paper has three co-authors: Facebook’s Adam Kramer, but also Jamie Guillory from UCSF and Jeffrey Hancock from Cornell. UCSF and Cornell are major research universities and receive large sums of federal funding. Both of them have institutional review boards (IRBs), as required by the Common Rule: an IRB examines proposed research protocols to make sure they protect participants, obtain informed consent, and otherwise comply with ethical and legal guidelines.

I don’t know whether the study authors presented it to an IRB (the paper doesn’t say), but it strikes me as the sort of research that requires IRB approval. It further strikes me that the protocol as described is problematic, for the reasons described above. I don’t know whether I’m more afraid that the authors never obtained IRB approval or that an IRB signed off on a project that was designed to (and did!) make unsuspecting victims sadder.

The study harmed participants: The paper also argues:

[The study software] was adapted to run on the Hadoop Map/Reduce system (11) and in the News Feed filtering system, such that no text was seen by the researchers.

This claim misses the point. For an observational study, automated data processing is a meaningful way of avoiding privacy harms to research subjects. (Can robot readers cause a privacy harm? Bruce Boyden would say no; Samir Chopa would say yes.) But that is because in an observational study, the principal risks to participants come from being observed by the wrong eyes.

This, however, was not an observational study. It was an experimental study—indeed, a randomized controlled trial—in which participants were treated differently. We wouldn’t tell patients in a drug trial that the study was harmless because only a computer would ever know whether they received the placebo. The unwitting participants in the Facebook study were told (seemingly by their friends) for a week either that the world was a dark and cheerless place or that it was a saccharine paradise. That’s psychological manipulation, even when it’s carried out automatically.

This is bad, even for Facebook: Of course, it’s well know that Facebook, like other services, extensively manipulates what it shows users. (For recent discussions, see Zeynep Tufekci, Jonathan Zittrain, and Christian Sandvig). Advertisers and politicians have been in the emotional manipulation game for a long time. Why, then, should this study—carried out for nobler, scientific purposes—trigger a harsher response?

One reason is simply that some walks of life are regulated, and Facebook shouldn’t receive a free pass when it trespasses into them simply because it does the same things elsewhere. Facebook Beacon, which told your Facebook friends what you were doing on other sites, was bad everywhere but truly ugly when it collided with the Video Privacy Protection Act. So here. Whatever you think of Facebook’s ordinary marketing-driven A/B testing is one thing: what you think of it when it hops the fence into Common Rule-regulated HSR is quite another. Facebook has chosen to go walking in a legal and ethical minefield; we should feel little sympathy when it occasionally blows up. (That said, insisting on this line would simply drive future research out of the academy and into industry, where our oversight over it will be even weaker. Thus …)

A stronger reason is that even when Facebook manipulates our News Feeds to sell us things, it is supposed—legally and ethically—to meet certain minimal standards. Anything on Facebook that is actually an ad is labelled as such (even if not always clearly.) This study failed even that test, and for a particularly unappealing research goal: We wanted to see if we could make you feel bad without you noticing. We succeeded. The combination of personalization and non-rational manipulation may demand heightened legal responses. (See, e.g., Ryan Calo, or my thoughts on search engines as advisors.)

The real scandal, then, is what’s considered “ethical.” The argument that Facebook already advertises, personalizes, and manipulates is at heart a claim that our moral expectations for Facebook are already so debased that they can sink no lower. I beg to differ. This study is a scandal because it brought Facebook’s troubling practices into a realm—academia—where we still have standards of treating people with dignity and serving the common good. The sunlight of academic practices throws into sharper relief Facebook’s utter unconcern for its users and for society. The study itself is not the problem; the problem is our astonishingly low standards for Facebook and other digital manipulators.

This is a big deal: In 2006, AOL released a collection of twenty million search queries to researchers. Like the Facebook study authors, AOL thought it was protecting its users: it anonymized the users’ names. But that wasn’t sufficient: queries like “‘homes sold in shadow lake subdivision gwinnett county georgia” led a reporter straight to user No. 4417749. Like Facebook, AOL had simply not thought through the legal and ethical issues involved in putting its business data to research purposes.

The AOL search-query release became known as the “Data Valdez” because it was a vivid and instantly recognizable symbol of the dangers of poor data security. It shocked the public (and the industry) into attention, and put search privacy on the map. I predict, or at least I hope, that the Facebook emotional manipulation study will do the same for invisible personalization. It shows, in one easy-to-grasp lesson, both the power Facebook and its fellow filters hold to shape our online lives, and the casual disdain for us with which they go about it.

UPDATE: The study was presented to an IRB, which approved it “on the grounds that Facebook filters user news feeds all the time, per the agreement.” See @ZLeeily, with hat tips to Kashmir Hill and @jon_penney.

UPDATE: Another @jon_penney pickup: it appears that the study itself was federally funded. Cornell amended the press release to say that the claim of federal funding was in error.

UPDATE: Kashmir Hill reports:

Professor Susan Fiske, the editor at the Proceedings of the National Academy of Sciences for the study’s publication, says the data analysis was approved by a Cornell Institutional Review Board but not the data collection. “Their revision letter said they had Cornell IRB approval as a ‘pre-existing dataset’ presumably from Facebook, who seems to have reviewed it as well in some unspecified way,” writes Fiske by email.

UPDATE: For much more on the IRB legal issues, see this detailed post by Michelle Meyer. She observes that the Common Rule allows for the “waiver or alteration” of informed consent for research that poses “minimal risk” to participants. The crucial issue there is whether the study “could not practicably be carried out without the waiver or alteration.” Meyer also has an extended discussion of whether the Common Rule apples to this research—following the Cornell restatement, it is much less clear that it does.

UPDATE: I’ve created a page of primary sources related to the study and will update it as more information comes in.

Google Books Round 86: Libraries Win Yet Again


The Second Circuit’s decision in Authors Guild v. HathiTrust is out. This, as a reminder, is the offshoot of the Google Books litigation in which the Authors Guild inexplicably sued Google’s library partners. The trial judge, Harold Baer, held for the libraries in 2012 in a positively exuberant opinion:

I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP [Mass Digitization Project] and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA.

The Second Circuit’s opinion drops the grand rhetoric, but otherwise the bottom line is basically the same: mass digitization to make a search engine is fair use, and so is giving digital copies to the print-disabled. The opinion on appeal is sober, conservative, and to the point; it is the work of a court that does not think this is a hard case.

On full-text search:

  • Factor 1: “[T]he creation of a full‐text searchable database is a quintessentially transformative use” because it serves a “new and different function.” Authors write to be read, not to be searched.
  • Factor 2: The nature of the copyrighted work fades into irrelevance for transformative uses.
  • Factor 3: Since full-text search requires copying full books, the copying isn’t excessive in light of the use. True, HathiTrust makes four copies of each book, two live and two in tape backup, but those are appropriate precautions against Internet outages and natural disasters. (It’s nice to see a court recognize that strict copy-counting is a fool’s errand in light of modern IT; better to focus, as the court here does, on the uses those copies enable.)
  • Factor 4: “[T]he full‐text‐search use poses no harm to any existing or potential traditional market … .” Book reviews do not substitute for sales of a book, even when they convince readers not to buy the book; so here. There is no lost licensing market because full-text search is not a substitute for books in the first place. (No citation to American Geophysical!) And while the Authors Guild says there’s a risk of a security breach, saying so doesn’t make it so: the harm from a hypothetical breach is pure speculation.

On print-disabled access:

  • Factor 1: Providing access to the print-disabled is not transformative: “By making copyrighted works available in formats accessible to the disabled, the HDL [HathiTrust Digital Library] enables a larger audience to read those works, but the underlying purpose of the HDL’s use is the same as the author’s original purpose.” But providing such access is still a favored use: there is a national policy of promoting access, reflected in the Chafee Amendment and recognized by the Supreme Court.
  • Factor 2: Irrelevant again, even though the use isn’t transformative. (Factor 2 never matters for published expressive works.)
  • Factor 3: The scanned images—and not just the OCR’ed text—are useful to print-disabled readers. Some readers are print-disabled because they need greater magnification or stronger color contrast than paper provides, others because they can’t turn pages. Scanned images help them both. (It’s nice to see a court take the diversity of disabilities seriously; Dan Goldstein’s advocacy here clearly helped.)
  • Factor 4: There is no market for selling books to the print-disabled; only a small percentage of books are published in accessible formats and even for those authors typically forego their royalties. (The Authors Guild’s war against text to speech has come back to bite it.)

These holdings merely affirm the District Court’s conclusions, but they are still a big deal. The Second Circuit’s decisions are binding precedent in New York, the nation’s publishing capital, and are highly influential beyond. Five judges have now upheld the legality of scanning books to make a search engine; none has disagreed.

The other major points in the opinion all consist of declining to decide:

  • The Authors Guild lacks standing to sue on behalf of its members. The case continues, thanks to the international organizations and the individual plaintiffs, but ouch. By pressing the Google Books cases, the Authors Guild has undercut its ability to take legal action on behalf of “authors” in general. In a real sense, it is legally weaker than when the case started.
  • Preservation uses aren’t ripe for consideration because the court has already held that hanging on to four copies is fully justified by the operation demands of providing full-text search. That only leaves printing replacement copies for lost or damaged ones when they’re unavailable for purchase at a fair price, but since it’s not clear whether or when that would happen—let alone whether it would happen to one of the remaining plaintiffs’ books—the issue isn’t ripe to decide.
  • Since Michigan has suspended the orphan works project (showing orphaned works to non-disabled patrons) and has no plans to reinstate it in the same form, those issues aren’t ripe either. The libraries dodged a bullet here; if they want to try again, it will be on terms of their choosing.

The opinion is a green light for library search engine digitization. It is an even greener light for making books and other works accessible to the disabled. And there was great rejoicing at the DPLA and the Internet Archive. There is not very much new in the opinion, but its very lack of novelty sends a strong signal that these uses are now clearly established.

What next? The Authors Guild could ask for rehearing, or petition for certiorari. I personally don’t like those odds, but I have never really understood the Guild’s decision-making process around this case, so who knows? The opinion sends a strong signal that the case against Google, also on appeal to the Second Circuit, is also likely to go in favor of scanning. At the very least, if the two cases are to be distinguished, it will have to be on narrow grounds: that Google makes commercial uses or shows snippets. Even that would provide clear guidance for digitizers. The holding may also cast a shadow on other search, education, and access cases, for example the Georgia State e-reserves case.