The Laboratorium

Is spam fair use?

A Brief History of Spam

To see how this four-word question arises, it may help to review a little of the history of spam filtering. Here’s a (slightly ahistorical) sketch: The earliest spams—basically email versions of the Green Card Lottery—consisted of the exact same message, sent to thousands or millions of recipients. Users annoyed by these spams, and ISPs annoyed at having to deliver them, took to comparing messages, senders, and message headers. If an incoming message was identical to a previously-seen message known to be spam, it would be dropped in the ætherial electronic wastebasket.

In response, the spammers realized they needed to start varying the content of their messages. The putative senders would be both forged and randomized; the subject line would be drawn from a large and combinatorial pool (e.g. “buy,” “purchase,” “obtain,” “acquire,” “get” and so on would be plugged into the verb slot, and a synonym for “cheap” into the adjective slot, and some drug name into the noun slot, leading to many many possible combinations); the message itself would be partly randomly generated. The filterers then dealt with this trick by using a group of techniques generally known as “Bayesian filtering,” some of which even are Bayesian filtering. The way these techniques work is that you scan a message assessing every word in it against a (specially-built) dictionary that tells you how much more likely seeing that word makes it that the underlying message is a spam. “Viagra” increases the spam score a bunch; “the” not so much. Note that this kind of filtering also helps with the original problem of deciding that the first message of a new type is actually spam (and +1 for you if you spotted that issue in the last paragraph).

The spam innovations that responded to statistical filtering were ugly, both figuratively and literally. Spammers started interpolating random text into their messages. At first, it was just a line of random gibberish pasted on to the end; then it was a line of random words here and there. This randomization both disrupted attempts to compare spam messages with each other for strict identity and also tried to trip up word-counting filters. The filters, however, proved reasonably good at recognizing that randomly-plucked dictionary words such as “xenograft” were astronomically unlikely to appear in legitimate messages and could therefore be all but ignored. Moreover, by expanding the filtering window to multiple words, the anti-spam tests could recognize that certain juxtapositions (e.g. “theogony xenograft”) simply did not appear in normal text.

Therefore, spammers took to plundering readily available texts not just for individual words but for entire lines and paragraphs of plausible verbiage. I first noted this phenomenon in 2003, when I got a spam whose text read:

that nothing can shake my confidence in you for a moment, Lordofthemuppets,
Our US Pharmacy is Open to You!
distrust of Owen; and to omit altogether a reference to the conduct which
We Now Have Xanax, Valium, Levitra, and Faster Acting Viagr@ SoftTabs
revived in a modified form. [30] The only warfare Sun Tzu knows
From US Pharmacies, not Mexico or Pakistan
thronged the amphitheatre, and watched exultingly while man slew
with more fitness than to him who had given me life?
Discreet and Fast Next-Day Shipments
physiologists. He made also a careful and fairly accurate study
Prescriptions written by US Doctors
The oars, the mast, and the sail are in the canoe. I have even succeeded
Browse our Selection
though distributed into distinct and mutually independent States,

All of the non-pharmaceutical text came from some corpus or other text readily available on the Web. Leaving aside the “Lordofthemuppets” (the dummy email account to which this message came), the first line was from Frankenstein. The other quotations came from the annotations to the letters of Charles Darwin;the Lionel Giles translation of Sun Tzu’s Art of War; Henry Smith Williams’s A History of Science; Frankenstein again; A History of Science again; Twenty Thousand Leagues Under the sea; and Orestes Brownson’s The American Republic. Most of the texts seem to have been taken from Project Gutenberg’s free public-domain e-books, right down to having the exact same line divisions.

From ripping off public-domain works, it was only a short step to ripping off works still in copyright. The technique of scraping the Web for content, in particular, proved attractive for the related spam practice of creating fake Web pages and fake blogs—as one might want to do if trying to trick search engines. Sometimes this content is arguably licensed, as in the case of the many, many Wikipedia clones floating around (new! improved! with added advertising!). Often it is not.

Is Spam Fair Use

To restate the original question in near-lawyerese, is it copyright infringement to use part or all of someone else’s copyrighted works in unsolicited bulk email for the purpose of confusing anti-spam filters into allowing the messages through? (If this were real lawyerese, I would probably have had to use longer synonyms for “email” and “anti-spam filters,” but my brain rebelled at the prospect.) It may be apparent that the same principles will apply, more or less, to spam blogs and other non-email forms of spam.

In U.S. copyright law, a finding of “fair use” is a defense to a claim of copyright infringement. In determining which uses count as fair, courts are instructed to take into account four factors: 1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for or value of the copyrighted work. Applying these four factors to spam is instructive.

The first factor tests, roughly, whether the allegedly-infringing use is a good thing or a bad thing. Nonprofit uses have an aura of blessedness about them, in the same way that nonprofit organizations get tax breaks. Commercial use isn’t necessarily bad; it just doesn’t activate the automatic sympathy that nonprofit use does. This factor is also used to favor parodies, education, and critical commentary, while disfavoring rote repetition.

I think it safe to say that most spam is (a) commercial and (b) bad. Spam is a social ill. We would almost all be better off if there was almost none of it. What is more, the use of copyrighted texts (without permission) in spam is doubly unpleasant, in that it doesn’t just promote spam; it promotes spam fraudulently. The purpose of quoting the copyrighted work is to deceive spam filters into thinking that the message is other than it is. Putting it all together, I would say that spam flunks the first factor about as badly as it is possible to flunk it.

The second factor asks about the “nature of the copyrighted work.” This one is a bit of an odd duck; courts differ in how they approach it. A lot of different questions have been crammed in here. Is the original artistic or commercial? Is the original clearly copyrightable, or is it so close to the border of unoriginality that it supports only a “thin” copyright? Is the original famous or obscure? How much effort and expense went into the original?

It’s hard to judge the question of spam’s fairness on this factor in the abstract. That’s because when we ask the question this way, the one thing we absolutely don’t have before us is the original. I’ve posed the question in a form that treats the original as a free variable; it could be anything. Given that, I’m prepared to assume that the spammer ripped off the wrong dude, and that the spam quotes from the most famous, most important, most original text in the Western world. After all, the first plaintiff to bring such a suit will probably be one with a pretty strong case on this factor, out of simple logic of self-selection. So give spam low marks on the second factor, as well.

The third factor asks how much of the copyrighted work is used in each allegedly-infringing copy. Quoting a paragraph in a review is more favored than quoting an entire book. It’s traditionally thought that uses of a few words will almost always be fair, since short phrases are very close to being uncopyrightable anyway.

Here, I think that any individual spam actually comes out pretty well. The spam I quoted above uses one line each from a few works, two lines tops. That is not much, particularly in the context of a huge multi-volume work. A spammer who quotes a few words here and a few words there hasn’t taken much from any individual author. Thing are a little different in the spam blog, where one may quote an entire post or an entire blog. But the blender-job that classic spammers perform looks superficially a lot like the collage/remix work that some copyright scholars claim we ought to be encouraging. Perhaps the great bulk of spam could cut against it here. The spammer above may well have quoted all of Frankenstein, one individual line at a time. Still, given that the spams were scattered to the four winds, I’m not sure that the spam really uses “more than necessary” in the way contemplated by this factor. You can’t easily reassemble the work from the many many spams.

The fourth factor, the effect on the market for the original, is where things get truly surprising. It makes economic sense that we should discourage uses that are likely to cut into the market for the original, since they will reduce the creator’s income and undermine some of the incentive to create in the first place. Similarly, new uses that don’t undermine any market the creator could have exploited don’t cause the creator monetary harm, and so one can ask why the creator should be allowed to object. Some scholars think that this factor is really the only factor; the others may provide guidance in assessing the effect, but at the end of the day, mere competitors lose while those who add something new win.

Spammers, however, aren’t even competitors. The dude who scrapes a few lines from a blog post of mine to use as filler in his spams isn’t stealing any readers from me. He’s also not exploiting a derivative market that I would have any interest in exploiting, or, indeed, could profitably exploit if I wanted to. (Even with perfect enforcement by objecting copyright holders, I don’t see how willing copyright holders could make one red cent out of licensing a corpus of spam texts.) It is true that the collective volume of spam may be harmful to me, in that it annoys me and makes me pay slightly more for useful Internet service, but that’s hardly copyright injury to me as an author—it doesn’t affect the market for my work.

Even if you think that the less useful Internet means I have a harder time marketing my work, that consequence is still an indirect harm of the sort that copyright scholars regularly proclaim shouldn’t count in the fair use inquiry. Thus, for example, it is frequently argued that a vicious parody may reduce demand for the original by making it an object of ridicule, but that reduced demand due to disrepute isn’t the right sort of harm. The parody doesn’t directly substitute for consumption of the original. Neither does spam. (The Wikipedia clones, which duplicate the whole encyclopedia and divert Google traffic from it, might be another case, but garden variety bricolage spam isn’t it.) Thus, mark the fourth factor a win for the spammer.

Where does this leave us? Spam loses resoundingly on the first factor (its own nature) and by default on the second (the nature of the original). It wins, at least in the core case, in the third factor (the amount copied), and wins decisively on the fourth (effect on the market for the original). So how do we aggregate these split findings into an overall decision?

I think the answer is easy: the spammer will lose under current doctrine, and moreover, the spammer should lose. The spammer is not using the work for anything socially worthwhile, nor is she using in her spams any of the creative attributes of the work itself. She’s an enormously unsympathetic defendant adding little to society, and the prima facie case of copyright infringement (the actual copying of the copyrighted text) is unarguable. Protecting deceitful spammers is not why we have a fair use defense.

A Few Observations

The above analysis has a few interesting consequences I would like to highlight. First, the Supreme Court has said that the fourth factor—the effect on the market—is the most significant. It is sometimes considered to be the only significant factor, either in that the others fade into irrelevance next to a strong finding of harm or no harm to the market, or that they simply help a court evaluate the effect on the market. But it cannot be the only factor. Spam provides a convincing counterexample, one showing that even a compelling showing of no harm on the market-effect factor cannot be determinative.

This observation has implications for economic analyses of copyright. Most notably, it indicates that a story of fair use simply in terms of authorial incentives must be wrong. There must be a term somewhere in the analysis that can take account of how wasteful spam is. Trying to identify that waste as a form of disincentive to potential authors requires doing violence to the facts. I may be ticked off that spammers are copying me, but it’s exceedingly unlikely to hurt my sales. If one is committed to an economic balancing, one can point to the overall social harm caused by spam, or the negative value of deceitful reuses, but these costs must enter the balancing process somewhere other than in the author’s market for her work.

Second, this analysis serves as a reminder of the oft-emphasized (and oft-forgotten) point that the four factors are only an aid to analysis, not a mechanical checklist. In the spam example, the effect on the market factor, while strongly favoring the spammer, is also beside the point. (The amount-copied factor, which favors the spammer more weakly, is actually more persuasive as an argument in her favor.) Here, it is the purpose and character of the use factor that really seems to capture what is the problem with spam copying. Not all of the relevant questions are captured in the four factors; nor are they all relevant in any given case. Pointing out that judges must also assess how salient the various factors are in a fair use case may do little to aid in the predictability of the fair use analysis. It gives them yet another route to do whatever they want in the case at bar, almost regardless of the four-factor analysis. But the alternative—ignoring the reality that some of the factors sometimes make little sense in context—seems worse.

And third, it seems worth noting that this case is almost trivially easy from a moral rights perspective. The spammer has stripped the original author’s name from the work—a violation of the right of attribution. She has also distorted the presentation of the work (by chopping it up and dropping it in the spam), and by using the work in this way, associated the author’s work with spam, a “derogatory action” that supplies the necessary “prejudic[e] to [her] honor or reputation” to complete a violation of the right to integrity. These two complaints, particularly the second, seem to get at the problem with spam copying quite directly. The work is being used for sleazy and harmful purposes, in a manner that mutilates the work, gives it unfair connotations, and enlists the author against her will in a dishonorable cause. Whether or not you think that moral rights make for good law, you have to admit that the moral rights approach gets at what’s wrong with spam filler from an author’s point of view.

In conclusion then, spam is not fair use. Also, spam is bad. The end.

January 4, 2007 at 4:58 PM

Steven

Invoking spam to make a general point about fair use might risk creating bad law out of extremely ugly facts. If you think IP regimes should only sacrifice free access to promote greater content (an assumption that, I grant, you’re challenging), it’s hard to see why there should be IP restrictions on publications that don’t appreciably alter an author’s incentives. I’m especially troubled by your emphasis on the worthiness of the allegedly infringing publication. Isn’t this the factor that threatens to attach liability solely due to a judge’s personal distaste about an art form?

On a more basic level, I’m not sure why there’s any need to say that spam isn’t fair use. Labeling something as fair use doesn’t immunize it from all liability (in the way that labeling it as “speech” might). So even if spam is fair use, you can make it illegal for other reasons, as spam-specific legislation does today. That, to me, is a better way to deal with spam than extending copyright liability. Just as the doctrine of fair use says that not all copyright violations are bad, this approach would say that not all fair use is good.

Moreover, saying that spam isn’t fair use doesn’t capture what’s wrong with spam. Suppose (and this may be a bad example) that somebody sends you a menacing letter that includes excerpts from copyrighted works describing grisly tortures. The letter is bad because it’s an illegal threat, not because it violates copyright law. I’d say a similar thing about spam.

January 4, 2007 at 6:14 PM

James Grimmelmann

I agree that discussing the worthines of the allegedly infringing publication creates lots of room for unpleasant judicial discretion. But to flip the question, why does spam bulking count as an activity that ought to receive any solicitude from the copyright law? While there may be some minor originality in the arrangement and selection, it’s not of the sort we should protect for copyright policy reasons. Or, to take up your discussion of “speech,” why is there anything in garden-variety spam that deserves substantial First Amendment solicitude? It’s commercial speech at the best; it’s unprotected fraud at the worst.

I’m not, by the way, invoking spam to make a general point about fair use. I’m making a general point about spam, and the application to spam of fair use law. I think a finding of no fair use is the right result here; I don’t see why that result should unreasonably threaten correct findings of fair use elsewhere. If a holding that spam is not fair use happens to knock out a false generalization on which other findings of fair use are alleged to rest, the response should be to search for a better grounding, not to cling to the generalization in the face of an unfortunate and incorrect result.

January 4, 2007 at 9:24 PM

Steven

I don’t mean to suggest that spam should be copyrightable; I just don’t think that spammers should be liable for using copyrighted excerpts that, in another context, would be fair use. That is as much solicitude as they should get under copyright law.

And let me be clear: while spammers would be beneficiaries of such a rule, they’re not the beneficiaries I care about! My concern with your contrary argument is that a test for fair use designed to exclude excerpts in spam might narrow the scope of fair use for more deserving non-spammers, especially if the worthiness of a publication becomes a much more important factor. (“Designed” might be the wrong word here; I’m referring to the fact that you appear to be suggesting a modification or rebalancing of the traditional test for fair use to support your conclusion.)

As to how a finding of no fair use could threaten correct findings of fair use elsewhere, I’m relying here on your suggestion that spam can be knocked out of fair use only by de-emphasizing the fourth factor (effect on the market) and emphasizing the first (purpose and character of the use). It is my uninformed sense that fair use becomes more lenient the more strictly you require that an infringing publication be a substitute to the copyrighted work in the marketplace; and that fair use narrows (or, at least, becomes more capricious) when the focus turns to the quality of the allegedly infringing use.

Finally, I still don’t see what ill consequences would flow from concluding that excerpts in spam can be fair use. When a piece of spam includes a copyrighted passage, the harm due to that copying seems to me extremely minimal: the pirated passage doesn’t affect the author’s market at all, and it is hardly the most significant contributor to a recipient’s aggravation. Moreover, acknowledging that spam could be fair use wouldn’t unduly warp the prevailing test for fair use either (at least not any more than the opposite conclusion), although I defer to your superior understanding of the law on this point.

I’ll admit that including spam in fair use requires acknowledging that some fair use may be socially undesirable, and even illegal (under non-copyright laws). But I don’t see any need to think of fair use as a warranty of social or artistic value; it should be seen as nothing more than, effectively, a de minimis violation of the copyright laws alone. Perhaps this, though, is where the prevailing doctrine and I separate.

January 5, 2007 at 10:54 AM

KYL

James, I’d like to probe your analysis of the second and fourth factors a bit more.

As for the second factor, you note that one aspect of the analysis is whether the original work is famous or obscure, and I read your post as suggesting that a famous original would deserve more protection. I’m not sure that is right. My understanding is that part of the analysis here is “whether the work is published or unpublished, with the scope for fair use involving unpublished works being considerably narrower.” 2 Howard B. Abrams, The Law of Copyright, § 15:52 (2006). By analogy, it would seem to me that more famous works should have wider scope for fair use than obscure works, i.e., uses of passages from Dan Brown’s works might be fair while uses of similar-sized passages from an obscure author might not be fair, all other things being equal.

As for the fourth factor, I’m not convinced that spammers are not in fact sometimes competitors for the original work. (I am thinking more of spam blogs rather than spam messages. ) Assume that some obscure author has written a blog entry describing a method for making interesting crafts projects out of old vinyl records, and that this author publishes the blog entry on his web site, supported by advertising. Spammer blogs appropriate sections of his story into spam blogs such that when you search for “vinyl crafts projects” in a search engine, the original author’s entry is buried deep beneath the spam blog results. Readers who search for this phrase would then overwhelmingly click on the spammer blogs and the original author of the blog entry would in fact be economically harmed because the spammer has in fact behaved as a competitor and diverted away advertising revenue that the original author would have received. The spammer has stolen readers, especially if the spam blog copies enough of the original entry to allow a reader to figure out the gist of how to make such crafts projects.

In a world in which much of a work’s presence and value (e.g., as a valuable advertising venue) depend on how readers can find it though an on-line search engine, it seems that any time spam blogs, through appropriation of original text, dilute the presence of the original work and divert readers from the original work, are behaving in some sense as competitors to the original work.

My point here may be simply that the fourth factor is very malleable and flexible, and can be analyzed to show that use as spam filler or search-engine ranking booster can directly compete with and harm the original author.