May 1998

The Refine Button

Consider, if you will, how much computing time goes into recreation, what fraction of the world's silicon and optic-cable computational biomass is given over to one form or another of "fun". Start with the amount of time you've spent playing solitaire or Minesweeper (still the best Microsoft program of all time), and multiply by the number of bored Windows users out there. Remember the glazed expression on the face of your local DooM-clone junkie, recall that his disheveled state is a consequence of his neglecting his personal hygeine to spend more time feeding his habit, keep in mind that he's getting his adrenaline poisoning fragging and being fragged by his buddies hunched over their own computers, and that's a heckuva lot of hours. A lot of CPU cycles, too, the way these things keep pushing the technological envelope on your graphics subsystem.

The Web, of course, never content to do anything by half measures, is mind-boggling when it comes to the vast realms of mindless entertainment it proffers. Given that it probably owed much of its early existence to the sudden availability of one-click porn it engendered, it has had its feet planted firmly in the realm of the recreational from the outset. But that's old hat these days -- what floats my boat are the entirely new forms of entertainment now available, forms never before avaiable, from the dawn of time down to this blessed day. You all understand random-walk web-surfing, you understand the voyeuristic fun of reading the drivel people decide they need to present to the world, and I'm sure some of you understand the exhibitionistic thrill of writing the drivel you decide you need to present to the world -- I know I do. Here on the Internet, the very state of having no life becomes the raw material from which the lifeless carve their amusement. Well, a couple of weeks ago, some friends of mine and I, sitting around flecking bits of metaphorical mud at each other, accidentally sculpted our own little AltaVista de Milo.

See that little button on the AltaVista page labelled "refine"? That button is my new best friend. Type in something promising, hit "refine," and prepare to laugh your head off. It's uncanny how wonderful the lists of synonyms and related topics the engine spits back are.

Sometimes, Digital's servers display a brilliant ability to boil an idea down to the purest nuggets of meaning. "Titanic" brings back the suggestion "Titanic, sinking, disaster." Three words. I challenge any of you to do better. The number one possibility AltaVista gives for "Nixon" is "Nixon, president, vice." I don't think it was thinking of the double meaning of "vice," but maybe, just maybe, the database is having a little fun. And "Bobbitt" brings back "Bobbitt, lorena, she, wayne, women, men, rape, feminists, acquitted." That one still cracks me up. If I were trying to explain the whole Bobbitt saga to aliens from Mars, I think I'd start by showing them AltaVista's take.

Other times, AltaVista displays a sort of naive free-associating charm. About halfway down the list of suggestions for "Buffy" is the gem "Xena, warrior, hercules, gabrielle, callisto, subtext, dilbert." Now, the Buffy-Xena connection sort of makes sense, as does the word cluster around "Xena". It's the sudden appearance of "subtext" that does it for me. "Dilbert," in turn, brings back "Adams, dogbert, scott, cubicle, creator, bosses, fads, ratbert." Yes folks, that's right. "fads." AltaVista is swift but merciless in its judgements.

And finally, sometimes, it just stops making sense at all. The number-two word-nexus for "Barney dinosaur" is "Goofy, spasms, popularity, elvis, cobweb, screwed, upstream." Uh-huh. "Cantor and Siegal" brings back "pimping" (with a very skeptical score of 1% in quality). And how about your favorite and mine, "median strip"? "Honer, undercurrent, lauwers, macko, drowned, barger" along with "Coathanger, winny, stepladder, humidifier, shakespeares, silencer, ouija, talkie, periscope" and "Posthumously, cairo, darwin, accidentally."

The question on my mind after a few hours of this mindless fun was "Why is a search engine so damn funny?" And after a little reflection, I think I have an answer that says as much about we who see the humor as it does about the web site actually cracking the jokes.

On the most basic level, the individual web pages that AltaVista indexes are the creations of individual people creating sites to say what they feel the need to say. Whether it's "buy this!" or "the Eastern Ontario Crocheting Circle's Stitch of the Week this week is the CROSS STITCH" or "why Pamela Anderson is such a babe," web pages exist because someone decided to open their electronic trap. The participants vary, but every web page is part of a conversation between living, breathing, thinking human beings. Every page implicitly carries with it the claim that "I think this is interesting, and I think you will too."

The web as a whole is nothing more than the agglomeration of these zillion different strands. They're not really very much in conversation with each other, which is part of the reason the web is so incredibly chaotic. I mean, really. Trying to index the web is like trying to summarize every conversation taking place across the country. So when your local search engine decides that it's going to go one better than just noting what pages contain what words how often, you kind of have to expect its results to be rather heavily influenced by the connections that a lot of people have made, whether independently or not. It's not clear that there's a better solution, even in theory, than noting correlated frequency of keywords.

Sometimes, these connections, however unlikely, acquire their own force through direct repetition. The words "median strip," for example, found their way into one of the many versions of the Darwin Awards floating around the Net. They also show up in a Steven Wright joke. Given that the epidemiology of the spread of this kind of material around the web is similar to its proliferation through email, there are a lot of independently hosted but otherwise identical copies of them floating around. Perhaps unsurprisingly, these are also two documents whose contents tend to jump around a lot (since they consist of a bunch of distinct segments united more by tone than anything else) and which contain a rather interesting assortment of nouns. High frequency plus unlikely juxtapositions -- well, that's the kind of stuff that AltaVista's "refine" command just starts drooling over.

On the other hand, sometimes these connections are emergent properties of a large number of pages -- the statistically significant overlaps among a larger population of pages. What AltaVista picks up on are the clusters of words that recur across different peoples' widely varying web sites. People who put up a still from Buffy have the right cultural mindset, by and large, to appreciate Xena -- and also to pick up on the various intriguing undercurrents in these shows. The site can generate the perfect nine-word description of the Bobbitt case because those nine words, being the most appropriate descriptors the human mind can imagine, are the words human minds, again and again, have settled upon in describing that case.

I don't claim to know for certain what humor is or what makes it work, but I think a large part of it must be the recognition of double meanings and hidden connections -- the pleasant shock of seeing an unexpected pattern or an unlikely reinterpretation. And I think this is what lies at the heart of my ability to entertain myself by playing with Digital's interface. When I enter my terms, it comes back at me with connections that I would have been strained to come up with myself, but can nonetheless recognize on sight.

Why are these connections so sensible to us? Because they're connections derived by minds exposed to the same cultural context as ours. AltaVista is slicing through layers of mediation -- that by the electronic formatting constraints of the Web and by the need for individual documents to conform to a standard descriptive format within the linguistic conventions of "printed" e-text. The "refine" command, by averaging across the umpty-skeezix pages out there, largely filters out the effects of any individual voices, removing particular personalities from its suggestions but leaving behind the traces of what they had in common. Forty million Frenchmen might well be wrong, but there may well be something recognizably French about the correlations and juxtapositions on their home pages.

I see this little detail as evidence for a broader theory of aesthetics rooted in the structure of human thought processes. The reason that we, collectively, find different branches of art and expression more or less equally appealing, in the end, is that in everything humans put their minds to, they ultimately wind up pressing against the limits of what those minds are creatively capable of. And then along come other people, with the same rough mental constraints -- so that the differing aptitudes of the mind for engaging in different forms of creation are properly scaled for because they more or less correspond to our abilities to appreciate those creations. It's a tight circle, but a beautiful one. What we're really appreciating when we consider something we find beautiful -- a subtle melodic interweaving, an elegant theorem, the brilliant arrangement of words on the page -- is the mental effort involved in creating it. We respond to the sympathetic echoes of the creative process it stimulates in our own mind, the communication between artist and audience, the joy of recognizing something mentally which is at once alien to us and also marvelously familiar.

What makes AltaVista so wonderful in this regard is that it allows us to laugh at "jokes" told by several million people. Perhaps this is the beginning of the collective mind, the fusion of humanity's disparate thoughts into one larger mental being. Perhaps. If so, then, if these are its first rumblings of sentience, its first emergent properties . . . . well, then it's a far cheerier and personable beast than most prognosticators give it credit for. And it makes me wonder, then, why Star Trek insists on portraying the Borg as so damn serious all the time. Anything larger built from the musings and interests of a billion disparate people is going to wind up reflecting those musings and interests -- or it won't wind up existing at all. For those of you who say otherwise, guess what?

The joke's on you.

Originally published 10 May 1998 at as "the refine button on altavista and how it affects our future."