Information Loss


Today’s computer science blooper comes from Perfect 10 v. Google, Inc., 416 F. Supp. 828, 847 n.13 (C.D. Cal. 2006):

For example, a typical full-size image might be 1024 pixels wide by 768 pixels high, for a total of 786,432 pixels worth of data. A typical thumbnail might be 150 pixels wide by 112 pixels high, for a total of only 16,800 pixels. This represents an information loss of 97.9% between the full-size image and the thumbnail.

The court here seems to believe that if a full-size image has N times as many pixels as a thumbnail, then the full-size image necessarily has N times as much information. False. Consider an all-black image (Casimir Malevich as digital artist, perhaps). From a 1000x1000 all-black image, we can make a 100x100 all-black thumbnail. If we then blow up the thumnail by a factor of 10, we get back not a blurred approximation of the original image, but the original image itself!

In general, making a thumbnail doesn’t lose as much information as a raw pixel count would suggest, because most of the images of which we might make thumbnails have subtantial regularities. Some of the fine detail may be lost, but it’s not as though every last pixel is equally important. Unless the thumbnail is particularly small, it’s likely to capture much of the essence of the original. Calculating the exact degree of information loss depends on the encoding and the particular image involved, but dividing pixel counts is qute likely to vastly overestimate the loss.

This was actually part of Perfect 10’s point in the litigation—that the thumbnail was a reasonable substitute for the original because it captured some of the original’s gestalt. This is also the reason that Google offers thumbnails. If they really were only 2% as information-rich as the original, you can bet that Google Image Search would be painful to use.