Is spam fair use?
A Brief History of Spam
To see how this four-word question arises, it may help to review a little of the history of spam filtering. Here’s a (slightly ahistorical) sketch: The earliest spams—basically email versions of the Green Card Lottery—consisted of the exact same message, sent to thousands or millions of recipients. Users annoyed by these spams, and ISPs annoyed at having to deliver them, took to comparing messages, senders, and message headers. If an incoming message was identical to a previously-seen message known to be spam, it would be dropped in the ætherial electronic wastebasket.
In response, the spammers realized they needed to start varying the content of their messages. The putative senders would be both forged and randomized; the subject line would be drawn from a large and combinatorial pool (e.g. “buy,” “purchase,” “obtain,” “acquire,” “get” and so on would be plugged into the verb slot, and a synonym for “cheap” into the adjective slot, and some drug name into the noun slot, leading to many many possible combinations); the message itself would be partly randomly generated. The filterers then dealt with this trick by using a group of techniques generally known as “Bayesian filtering,” some of which even are Bayesian filtering. The way these techniques work is that you scan a message assessing every word in it against a (specially-built) dictionary that tells you how much more likely seeing that word makes it that the underlying message is a spam. “Viagra” increases the spam score a bunch; “the” not so much. Note that this kind of filtering also helps with the original problem of deciding that the first message of a new type is actually spam (and +1 for you if you spotted that issue in the last paragraph).
The spam innovations that responded to statistical filtering were ugly, both figuratively and literally. Spammers started interpolating random text into their messages. At first, it was just a line of random gibberish pasted on to the end; then it was a line of random words here and there. This randomization both disrupted attempts to compare spam messages with each other for strict identity and also tried to trip up word-counting filters. The filters, however, proved reasonably good at recognizing that randomly-plucked dictionary words such as “xenograft” were astronomically unlikely to appear in legitimate messages and could therefore be all but ignored. Moreover, by expanding the filtering window to multiple words, the anti-spam tests could recognize that certain juxtapositions (e.g. “theogony xenograft”) simply did not appear in normal text.
Therefore, spammers took to plundering readily available texts not just for individual words but for entire lines and paragraphs of plausible verbiage. I first noted this phenomenon in 2003, when I got a spam whose text read:
that nothing can shake my confidence in you for a moment, Lordofthemuppets,
Our US Pharmacy is Open to You!
distrust of Owen; and to omit altogether a reference to the conduct which
We Now Have Xanax, Valium, Levitra, and Faster Acting Viagr@ SoftTabs
revived in a modified form. [30] The only warfare Sun Tzu knows
From US Pharmacies, not Mexico or Pakistan
thronged the amphitheatre, and watched exultingly while man slew
with more fitness than to him who had given me life?
Discreet and Fast Next-Day Shipments
physiologists. He made also a careful and fairly accurate study
Prescriptions written by US Doctors
The oars, the mast, and the sail are in the canoe. I have even succeeded
Browse our Selection
though distributed into distinct and mutually independent States,
All of the non-pharmaceutical text came from some corpus or other text readily available on the Web. Leaving aside the “Lordofthemuppets” (the dummy email account to which this message came), the first line was from Frankenstein. The other quotations came from the annotations to the letters of Charles Darwin;the Lionel Giles translation of Sun Tzu’s Art of War; Henry Smith Williams’s A History of Science; Frankenstein again; A History of Science again; Twenty Thousand Leagues Under the sea; and Orestes Brownson’s The American Republic. Most of the texts seem to have been taken from Project Gutenberg’s free public-domain e-books, right down to having the exact same line divisions.
From ripping off public-domain works, it was only a short step to ripping off works still in copyright. The technique of scraping the Web for content, in particular, proved attractive for the related spam practice of creating fake Web pages and fake blogs—as one might want to do if trying to trick search engines. Sometimes this content is arguably licensed, as in the case of the many, many Wikipedia clones floating around (new! improved! with added advertising!). Often it is not.
Is Spam Fair Use
To restate the original question in near-lawyerese, is it copyright infringement to use part or all of someone else’s copyrighted works in unsolicited bulk email for the purpose of confusing anti-spam filters into allowing the messages through? (If this were real lawyerese, I would probably have had to use longer synonyms for “email” and “anti-spam filters,” but my brain rebelled at the prospect.) It may be apparent that the same principles will apply, more or less, to spam blogs and other non-email forms of spam.
In U.S. copyright law, a finding of “fair use” is a defense to a claim of copyright infringement. In determining which uses count as fair, courts are instructed to take into account four factors: 1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for or value of the copyrighted work. Applying these four factors to spam is instructive.
The first factor tests, roughly, whether the allegedly-infringing use is a good thing or a bad thing. Nonprofit uses have an aura of blessedness about them, in the same way that nonprofit organizations get tax breaks. Commercial use isn’t necessarily bad; it just doesn’t activate the automatic sympathy that nonprofit use does. This factor is also used to favor parodies, education, and critical commentary, while disfavoring rote repetition.
I think it safe to say that most spam is (a) commercial and (b) bad. Spam is a social ill. We would almost all be better off if there was almost none of it. What is more, the use of copyrighted texts (without permission) in spam is doubly unpleasant, in that it doesn’t just promote spam; it promotes spam fraudulently. The purpose of quoting the copyrighted work is to deceive spam filters into thinking that the message is other than it is. Putting it all together, I would say that spam flunks the first factor about as badly as it is possible to flunk it.
The second factor asks about the “nature of the copyrighted work.” This one is a bit of an odd duck; courts differ in how they approach it. A lot of different questions have been crammed in here. Is the original artistic or commercial? Is the original clearly copyrightable, or is it so close to the border of unoriginality that it supports only a “thin” copyright? Is the original famous or obscure? How much effort and expense went into the original?
It’s hard to judge the question of spam’s fairness on this factor in the abstract. That’s because when we ask the question this way, the one thing we absolutely don’t have before us is the original. I’ve posed the question in a form that treats the original as a free variable; it could be anything. Given that, I’m prepared to assume that the spammer ripped off the wrong dude, and that the spam quotes from the most famous, most important, most original text in the Western world. After all, the first plaintiff to bring such a suit will probably be one with a pretty strong case on this factor, out of simple logic of self-selection. So give spam low marks on the second factor, as well.
The third factor asks how much of the copyrighted work is used in each allegedly-infringing copy. Quoting a paragraph in a review is more favored than quoting an entire book. It’s traditionally thought that uses of a few words will almost always be fair, since short phrases are very close to being uncopyrightable anyway.
Here, I think that any individual spam actually comes out pretty well. The spam I quoted above uses one line each from a few works, two lines tops. That is not much, particularly in the context of a huge multi-volume work. A spammer who quotes a few words here and a few words there hasn’t taken much from any individual author. Thing are a little different in the spam blog, where one may quote an entire post or an entire blog. But the blender-job that classic spammers perform looks superficially a lot like the collage/remix work that some copyright scholars claim we ought to be encouraging. Perhaps the great bulk of spam could cut against it here. The spammer above may well have quoted all of Frankenstein, one individual line at a time. Still, given that the spams were scattered to the four winds, I’m not sure that the spam really uses “more than necessary” in the way contemplated by this factor. You can’t easily reassemble the work from the many many spams.
The fourth factor, the effect on the market for the original, is where things get truly surprising. It makes economic sense that we should discourage uses that are likely to cut into the market for the original, since they will reduce the creator’s income and undermine some of the incentive to create in the first place. Similarly, new uses that don’t undermine any market the creator could have exploited don’t cause the creator monetary harm, and so one can ask why the creator should be allowed to object. Some scholars think that this factor is really the only factor; the others may provide guidance in assessing the effect, but at the end of the day, mere competitors lose while those who add something new win.
Spammers, however, aren’t even competitors. The dude who scrapes a few lines from a blog post of mine to use as filler in his spams isn’t stealing any readers from me. He’s also not exploiting a derivative market that I would have any interest in exploiting, or, indeed, could profitably exploit if I wanted to. (Even with perfect enforcement by objecting copyright holders, I don’t see how willing copyright holders could make one red cent out of licensing a corpus of spam texts.) It is true that the collective volume of spam may be harmful to me, in that it annoys me and makes me pay slightly more for useful Internet service, but that’s hardly copyright injury to me as an author—it doesn’t affect the market for my work.
Even if you think that the less useful Internet means I have a harder time marketing my work, that consequence is still an indirect harm of the sort that copyright scholars regularly proclaim shouldn’t count in the fair use inquiry. Thus, for example, it is frequently argued that a vicious parody may reduce demand for the original by making it an object of ridicule, but that reduced demand due to disrepute isn’t the right sort of harm. The parody doesn’t directly substitute for consumption of the original. Neither does spam. (The Wikipedia clones, which duplicate the whole encyclopedia and divert Google traffic from it, might be another case, but garden variety bricolage spam isn’t it.) Thus, mark the fourth factor a win for the spammer.
Where does this leave us? Spam loses resoundingly on the first factor (its own nature) and by default on the second (the nature of the original). It wins, at least in the core case, in the third factor (the amount copied), and wins decisively on the fourth (effect on the market for the original). So how do we aggregate these split findings into an overall decision?
I think the answer is easy: the spammer will lose under current doctrine, and moreover, the spammer should lose. The spammer is not using the work for anything socially worthwhile, nor is she using in her spams any of the creative attributes of the work itself. She’s an enormously unsympathetic defendant adding little to society, and the prima facie case of copyright infringement (the actual copying of the copyrighted text) is unarguable. Protecting deceitful spammers is not why we have a fair use defense.
A Few Observations
The above analysis has a few interesting consequences I would like to highlight. First, the Supreme Court has said that the fourth factor—the effect on the market—is the most significant. It is sometimes considered to be the only significant factor, either in that the others fade into irrelevance next to a strong finding of harm or no harm to the market, or that they simply help a court evaluate the effect on the market. But it cannot be the only factor. Spam provides a convincing counterexample, one showing that even a compelling showing of no harm on the market-effect factor cannot be determinative.
This observation has implications for economic analyses of copyright. Most notably, it indicates that a story of fair use simply in terms of authorial incentives must be wrong. There must be a term somewhere in the analysis that can take account of how wasteful spam is. Trying to identify that waste as a form of disincentive to potential authors requires doing violence to the facts. I may be ticked off that spammers are copying me, but it’s exceedingly unlikely to hurt my sales. If one is committed to an economic balancing, one can point to the overall social harm caused by spam, or the negative value of deceitful reuses, but these costs must enter the balancing process somewhere other than in the author’s market for her work.
Second, this analysis serves as a reminder of the oft-emphasized (and oft-forgotten) point that the four factors are only an aid to analysis, not a mechanical checklist. In the spam example, the effect on the market factor, while strongly favoring the spammer, is also beside the point. (The amount-copied factor, which favors the spammer more weakly, is actually more persuasive as an argument in her favor.) Here, it is the purpose and character of the use factor that really seems to capture what is the problem with spam copying. Not all of the relevant questions are captured in the four factors; nor are they all relevant in any given case. Pointing out that judges must also assess how salient the various factors are in a fair use case may do little to aid in the predictability of the fair use analysis. It gives them yet another route to do whatever they want in the case at bar, almost regardless of the four-factor analysis. But the alternative—ignoring the reality that some of the factors sometimes make little sense in context—seems worse.
And third, it seems worth noting that this case is almost trivially easy from a moral rights perspective. The spammer has stripped the original author’s name from the work—a violation of the right of attribution. She has also distorted the presentation of the work (by chopping it up and dropping it in the spam), and by using the work in this way, associated the author’s work with spam, a “derogatory action” that supplies the necessary “prejudic[e] to [her] honor or reputation” to complete a violation of the right to integrity. These two complaints, particularly the second, seem to get at the problem with spam copying quite directly. The work is being used for sleazy and harmful purposes, in a manner that mutilates the work, gives it unfair connotations, and enlists the author against her will in a dishonorable cause. Whether or not you think that moral rights make for good law, you have to admit that the moral rights approach gets at what’s wrong with spam filler from an author’s point of view.
In conclusion then, spam is not fair use. Also, spam is bad. The end.