The Laboratorium : GBS: CDT's Privacy Principles

More from the pipeline: last month, the Center for Democracy and Technology released a set of privacy recommendations. The EFF may have narrowed its focus to government data collection, but the CDT is taking a more comprehensive approach, drilling down on the kinds of privacy protections one would look for in any heavily used online service. As usual, let me call out salient passages:

The settlement deserves court approval because it will unquestionably provide a significant public benefit at a size and scale that is not otherwise likely to be replicated in the near term. (1)

CDT, like the ALA, believes in the principle of “approve, then improve.”

At a minimum, before the settlement is approved, Google should issue a set of privacy commitments that explains both its general approach to protecting reader privacy and its process for addressing privacy in greater detail as Google Book Search moves forward. Since further detail regarding privacy matters may need to be fleshed out over time as the services are built, the court should monitor implementation of these privacy commitments as part of its ongoing supervision of the settlement. Critically, this structure—a set of evolving privacy commitments with court supervision—does not require an alteration of the current settlement. (1)

This approach makes CDT’s recommendations easier to implement, but means that CDT is more dependent on carrots than on sticks.

Indeed, we believe that the adoption of a robust privacy framework here will set a high standard for other providers as the online market for electronic books expands. (4)

Witness the carrot. This one is specially designed to appeal to Google, which likes to say it can beat anyone in a fair fight, and which likes to present itself as a boy scout of a company

Shortly before the publication of these recommendations, Google posted a “Privacy FAQ” list to the GBS blog. CDT is pleased that Google has made such a public statement—and indeed the statement speaks to several of our concerns—but we believe that this represents the beginning, not the end, of discussions of reader privacy, and that an explicit commitment subject to court oversight is nonetheless required. (3)

Still, CDT isn’t just going to leave things up to Google’s good graces.

Providing such breadth of electronic access to so many published books will give Google an unparalleled view of people’s reading and information-seeking habits. By hosting the scans and closely managing user access, Google will have the capability to collect data about individual users’ searches, preview pages visited, books purchased, and perhaps even time spent reading particular pages. Whereas in the offline world such data collection is either impossible or widely distributed among libraries and bookstores, Google will hold a massive centralized repository of books and of information about how people access and read books online. (5)

That’s a nice overview of how both the digital transition and the scope of this project have privacy implications.

While the settlement agreement does not fully describe the types of data Google will collect, it does offer some indication. Detailed user information will be collected and used to differentiate among the services offered, to calculate payments to rightsholders, and to prevent unauthorized access to the scanned books. (5)

Ironically, the settlement’s only explicit treatment of privacy comes in negatives. It’s true that Google plans to implement the services in ways that are more privacy-friendly than the settlement’s floor, but that fact also underscores just how low the floor is. We’re talking subterranean.

The agreement does state that Google cannot be forced to disclose “confidential or personally identifiable information except as compelled by law or valid legal process” in the case of a security breach, but it does not address voluntary disclosure by Google. (6)

This being EFF’s big concern.

We recognize that because the implementation of the GBS service is not yet fully conceptualized, it may not be possible for Google to commit to every privacy detail now. We therefore urge that Google set out, with as much specificity as possible, a baseline approach to safeguarding reader privacy that it can commit to now, as well as a process for articulating further detail once the settlement is approved and Google begins to design the implementation of GBS. That process, and the detailed privacy practices that emerge from it, should be subject to court oversight. (7)

This sounds right. To the extent that Google says it can’t make privacy promises yet because it hasn’t designed the system, I fear a whipsaw. Google controls the timing and contents of the settlement, and the timing and features of its services. It’s awfully convenient that the service “is not yet fully conceptualized” in these pre-approval days. But Google still has the chance to make things right on the privacy front; baselines, process, and court supervision should satisfy everyone’s concerns.

We believe that Google should clearly and prominently disclose the following… (8)

Notice, notice, notice; honestly, this matters more so that groups like CDT can be watchdogs than because users will read even the most prominent of privacy policies.

Thus, CDT believes that Google should commit to collecting only the data necessary to provide the services laid out in the settlement. For example, Google generally does not need to collect details about how readers are consuming its digital books—which pages they view, how often they view them, how long they view them, and so forth—and thus the default should be that it will not do so. (8)

Modulo concerns about security (e.g. watching for telltale signs of botnets, DDoS attacks, hacking attempts, and the like), makes sense to me.

Google, however, should have no need to know the identity of any individual user of an Institutional Subscription, only that such a user is authorized under the subscription. Institutions should therefore be responsible for authenticating their own end users without sharing authentication credentials or other personal information with Google. (9)

This matches my understanding of what Google plans to do, and thus should be no problem to commit to.

Given the potential sensitivity of information surrounding reading habits, Google should refrain from using information collected through GBS for purposes other than to provide and secure the GBS service. (9)

When it comes to data that could reach third parties, it’s hard to emphasize this one enough. The terror of Scroogled turns on sensitive information being “repurposed.” The books you read are especially dangerous in this regard; just ask Winston Smith.

By default, information collected through GBS should not be used in connection with any other Google services or combined with data from other Google services, including advertising services provided outside the GBS site. (9)

Here, I’m less certain. I don’t so much care about combining this information with other Google services, provided the combination doesn’t open up leaks. I wouldn’t want information about your reading habits to be reconstructable from observing your customized searches, for example.

The Book Annotation feature may create especially sensitive user records because it involves users generating their own content and identifying other users with whom to share it. (9)

Right now, Book Annotations look “private” because they involved restricted, explicit sharing. But the parties have made noises about eventually opening up additional annotation features for broader distribution. Mark my words: as book annotations go social, there will be emergent privacy problems without clear answers.

Google has an obligation to state, in advance, what kinds of process it will comply with and what kinds it will resist. At a minimum, Google should state publicly that, except in cases of emergency or situations in which Google determines that it has little chance of prevailing, it will take reasonable steps in response to government requests to insist that the government obtain a court order or warrant issued upon probable cause to compel disclosure of information that could be used to identify a user or to associate a user with access to particular books. (10-11)

This one is easier for the time being because the settlement is U.S.-only. Some of the difficulty in articulating how strongly Google will resist stems from the enormous variety of legal systems around the world. In some, like China, a request can be unofficial but mandatory; in others, like Italy, official legal processes can be massively abused. For a U.S.-only deal, though, a unified policy is the kind of thing Google would need for its own internal operations and ought reasonably to disclose.

Because it is difficult to foresee exactly what types of requests will be made, and because experience with those requests might reveal the need for adjustments to these disclosure standards, Google should commit to making available to the public certain details about the compulsory disclosure of GBS information. Specifically, it should make public the number of requests by government and civil litigants for GBS usage or user-identifying data it has received, the types of information sought, the types of legal action underlying the requests, Google’s response to each request, and the types of information, if any, that were in fact disclosed. (11)

Again, this one is important for the watchdogs like CDT.

Google should therefore commit to retain data in identifiable form or in association with a reader identifier only as long as is necessary for the purpose for which the data was collected, and in any event no longer than 90 days. (13)

Sounds like a good idea, and I wish CDT good luck in getting Google to agree.

The settlement’s Security Standard outlines a comprehensive set of security and compliance requirements that Google must implement to protect digitized files, but no equivalent set of requirements to protect data about readers and their use of GBS. To the extent applicable (some requirements might be irrelevant to securing reader information, such as the requirement to watermark digital images served to users), we believe that Google should apply the same security standard to the data that it collects in connection with GBS. (13)

This is a simple, elegant, and eminently fair principle.

I look forward to seeing Google’s response to these recommendations.