The Laboratorium: May 2007 Archives

Why I Have Not Blogged Yesterday or Today, a Partial Explanation

Is the headline “China Crafts Cyberweapons” from

(A) A Terranova post about cut-rate Chinese power-levelers cornering the market for exotic loot in MMORPGs;

(B) A Slashdot post about Chinese hackers developing viruses for use against enemy computer systems; oe

Answer here.

May 24, 2007

6 Comments

Riddle me this: why did the mofo who took out a Kohl’s charge card using my name (misspelled) and bought $500 of housewares also sign up for “Account Ease” protection, which would cancel his (read: my) balance if he (read: I) were hit by a bus?

Mathematicians’ Revenge

May 22, 2007

Scan Them All and Let Google Sort Them Out

It’s Search Engine Week here at the Batcave. Now, to be honest, since I started writing on the subject, pretty much every week is Search Engine Week. But this week is Search Engine Week in the sense that I’m reading multiple books on search engines. After the Jeanneney, I’m now into Google’s PageRank and Beyond, a moderately rigorous mathematical discussion of PageRank and Kleinberg’s HITS. The question that it brings to mind is:

Until a few years ago, would you ever in your wildest dreams have thought that one of the most lucrative businesses in the world would be based on computing the eigenvectors of linear operators on billion-dimensional vector spaces?

May 22, 2007

The Yiddish Policemen’s Union I give it 5 stars

Towards the end of Jean-Noöl Jeanneney’s Google and the Myth of Universal Knowledge, he writes:

In practical terms, what criteria will govern the decision to digitize certain works? With respect to the vast legacy of works now in the public domain … we should favor the great founding texts of our civilization, drawing from each of our countries: encyclopedias; journals of scholarly societies; major writings that have contributed to the rise of democracy, to human rights, and to the recent unification of the Continent; writings that have fostered the development of literary, scientific, legal, and economic knowledge, as well as artistic creation. We should add to these, as I’ve already suggested, works that have appeared in numerous translations, thus attesting to their influence. The same guidelines, probably with less rigid specifications, can be followed for the more recent period. (p. 78)

A little earlier, he explains why this rigorous process of selection is necessary:

A final observation: maybe one of the reasons that the top managers of Google never seriously broach the question of how works are to be digitized is that they maintain the conviction—or rather the illusion—that they can digitize all the books that have ever been printed since the time of Gutenberg. In this fantasy world, there would be no need to worry about selection, and the performance of the digital library would depend only on the quality of the search engine (or engines). But since this perspective is beyond what we can reasonably envision (Is this a bad thing?), we must find the means not only to furnish Internet users with organized knowledge but to indicate its limitations. (p. 73)

This is, in a word, oldthink. Jeanneney repeatedly assumes that comprehensive digitization of our print archives is a pipe dream, from which it follows that the selection processes governing digitization acquire enormous cultural and political importance. I certainly agree with him that the selection is critical, particularly in the shorter run. It is, for example, of great importance that Google’s Book Search scanning project be accompanied by equally ambitious projects for non-Anglophone collections. But that’s as far as this particular observation belief to go.

We are going to have the capability of scanning everything, and we should. Scan it, OCR it, check it, stick it online, and open it up to lots of search tools. (Not just one: he and I agree on this, as well.) The initial Google announcement involved some fifteen million books. The total number of titles printed in the West since Gutenberg is somewhere upward of a hundred million. Google’s proposal is ambitious but clearly realizable; aiming for a hundred million is somewhat more ambitious but not unreasonably so. It will seem more and more plausible with time, as scanning and indexing technology continue to improve.

Things will be chaotic, certainly. There will be duplicated scans, scans of different editions of the same book, scans of translations and pirated foreign editions, scans of books missing pages, and so on and so forth. But these are not dealbreakers. These are exactly the sort of gnarly semistructured data analysis that drove the last few rounds of stunning innovation in Web search. Get the corpus out there and search algorithms will arrive to take advantage of it. Call it a Say’s law of data: put an interesting dataset online and someone will find something interesting to do with it. The point is that massive scanning helps create raw material on which the complexity-increasing dynamism of the Internet feeds. Committees of experts can help us decide what to scan first, but they should not have to decide what to scan at all.

Jeanneney, president of the Bibliothèque nationale de France, clearly loves the potential of digital archiving and appreciates the value of search. But he doesn’t get search. Again and again he complains: “An indeterminate, disorganized, unclassified, uninventoried profusion is of little interest.” (p. 7) “Under these conditions, an undertaking of this kin, attractive as it appears, can hardly be pursued effectively other than within a restricted community capable of ensuring quality under cooperative control.” (p. 51) “The fantasy of exhaustiveness dissipates in the need for choices.” (p. 71) “Hasty classification of a list, following obscure criteria of classification, must be replaced by a whole range of modes, classification modes for responses and presentation modes for results, to allow for many different uses.” (p. 72) “And we must help their teachers by protecting them from disorganized information.” (p. 87)

Exactly, I would say. That’s exactly what good search does. It turns a profusion of scattered information into accessible, organized forms. Jeanneney is right to demand diversity both in the information accessible and in the tools used to access it. But he doesn’t seem to get the idea that the best way to create useful order online is to embrace the chaos. Wikipedia’s lack of a “restricted community” helps it produce more reliable information, not less. Google works because it indexes everything, rather than picking and indexing a subset of high-quality sites. Jeanneney sees a cluttered desk and assumes a disorganized mind.

There is much else to say about this fascinating, baffling, brilliant, confused, maddening, and thoroughly Gallic sliver of a book, but this is the thought that stuck most in my mind as I read it.

May 19, 2007

Wanted: A Bookmark Button Standard

Some years ago, after reading and loving Michael Chabon’s The Amazing Adventures of Kavalier and Clay, I poked around his web site and came across a remarkable essay by the name of “Say it in Yiddish.” It starts as a set of wistful reflections on a Yiddish phrasebook for travelers, and then spins out into a fantasia on the idea of a Yiddish-speaking Jewish homeland in Alaska.

Chabon came through Seattle for a reading later that month. When I got to the head of the receiving line, I used my few seconds of author time to tell him how much I had liked “Say it in Yiddish.” He was visibly surprised. Only after a moment of you-didn’t-really-just-say-that bafflement did his expression turn to gratitude. It was as though his natural emotional reaction had been filtered through a disbelief filter. He thanked me, saying that he didn’t often hear that. Then he signed my book with a flourish and a sketch of a key (the symbol of the Escapist from the comics in Kavalier and Clay.)

At the time, I chalked his reaction up to the obscurity of the essay. Only later did I learn that that the essay had attracted a fair amount of controversy. Several leading scholars of Yiddish thought that it was an attempt to make fun of them or of the language. I think now that what surprised him about my praise for the essay was not that I was mentioning it at all, but that I was praising it.

Well, Chabon has now turned the essay into a novel, for which my praise knows no bounds. The Yiddish Policemen’s Union turns his conceit of an Alaskan Jewish homeland into the setting for a detective story. In this alternate universe, the Federal District of Sitka was carved out of Alaska as a temporary resettlement area for European Jews during the Second World War; now, sixty years later, it is a few months away from “Reversion,” and everyone is nervously awaiting the unknown next stage in the ongoing exile of the Jews. Against this backdrop, down-on-his-luck homicide detective Meyer Landsman washes up in a cheap residence hotel, where one of his neighbors turns up dead. More out of a sense of personal affront than anything else, Landsman starts poking his nose around, discovers that powerful unknowns want him off the case, turns up some unexpected connections to an insular messianic Hasidic sect, and deals with the usual assortment of beatings and surprises any detective protagonist must endure.

Chabon’s Sitka is a gloomy, cantankerous place. An atmosphere of decay and depression pervades the novel, a sense of desperation as this dark and cold homeland is running out its days. He has a talent for tossing off scene-setting details casually, as though they are simply a part of the background knowledge that everyone shares: a snack of pickles dipped in sour cream, the Big Macher department store, a leftover landmark from the 1977 Sitka World’s Fair now locally known as the “Safety Pin.” It is such a perfectly realized place that both the characters and the plot grow naturally in its frigidly alien soil.

The writing is also spectacular. Of course there are Yiddishisms everywhere, from colorful words like noz and shtarker to phrases that are clearly English renderings of Yiddish originals: “sweetness” (from bubeleh) and “bang me a kettle” (from hak mir nisht ken tshaynik). The dialogue is florid and insult-laden, and Chabon is good enough at the rhythms of Yiddish complaint that you can tell the genuine invective from the disgruntled banter that his characters speak as a matter of idiom. He intends for the whole novel to read as though it were a loose translation from a Yiddish original, and it does.

Add to these virtues of atmosphere and language the usual qualities one expects from a Chabon novel: a compelling plot, sympathy for all of his characters, moving reflections interlaced here and there, a memorable sentence at least once a page, an instinct for universal human weaknesses and surprising strengths. The Yiddish Policemen’s Union is neither better nor worse than Kavalier and Clay. Both are as good as one could hope for in a novel, each in its own way. I didn’t read this one in one sitting, but I wish that I could have.

May 18, 2007

2 Comments

Chase posted a ridiculous screenshot of “share this post” buttons. You know how some blog posts end with a little icon inviting you to “Digg this” post, or to bookmark it with Del.icio.us? Chase found a blog that provides buttons for some twenty-three different bookmarking, link-sharing, and other Web 2.0 services. (It reminds me a little of Jason Kottke’s Metadazzle overfizzle.)

I can understand how displaying these buttons can be in the karma-whoring interests of bloggers. Being easily Digg-able helps your chances of being Digged. The same goes for being easily Furl-able, and so on down the line to the more obscure forms of popularity: Scuttle-ishness, DZone-hood, Fleck-itude, and so forth. You might have some readers using Jim-Bob’s Bookmark Service, so why not add a Jim-Bob-This button? The trouble is that the Jim-Bob Bookmark users don’t use SquidShare, and vice-versa. Each individual reader might care about one or three of the buttons, but not the rest.

The problem, however, would be easy to fix with a bookmark button standard. A simple three-step dance among bloggers, bookmark services, and browsers would allow readers to see those and only those bookmarking buttons of interest to them. Here’s a quick sketch of how it would work.

In step one, bloggers would add a bit of metadata to their blog posts. Each entry displayed on a page would include a little bit of extra data indicating the permalink of the entry. This would be easy to automate with blog software.

In step two, users would tell their browsers what their favorite bookmark services are. This would involve a Firefox plugin; I’m sorry that it would be harder to do in IE, but if you’re using IE, your browser sucks. The telling mechanism could be almost completely automated. Once you had the plugin installed, it could be configured to recognize “add this bookmarking service” buttons from the appropriate bookmarking service sites. All the sites would need to do would be to create a small XML payload of their own telling the plugin what the format of their bookmarking buttons was: an icon, a name, and instructions for consing up a URL from the metadata supplied by bloggers.

In step three, browsers would recognize the bookmarklet metadata and automagically replace with with the appropriate button for the user’s bookmark service of choice. If you’ve told your browser that you’re a Digg and SquidShare user, you see Digg and SquidShare buttons. If you’re a Jim-Bob Bookmark fan, you see a Jim-Bob Bookmark button.

The great thing about such a system, in addition to the decrease in screen clutter, is that bloggers wouldn’t need to know about each and every bookmarking service out there. Merely by exposing a little data in a standard format, they would enable any bookmarking service, present or future, to interoperate with their posts. It’s a more Semantic-Web-y way of doing things, and it better respects modularity.

What about the users who haven’t installed the plugin, you may ask? Aha! I have an answer for you. (Surprise, surprise.) It’s okay to leave the current button soup in place. Just wrap it in another piece of metadata that tells the browser where to find the current welter of buttons. Users who don’t have the plugin just see the existing mess. But for users who do, the plugin hides the mess of buttons at the same time as it displays the specific buttons the user wants to see.

This staging strategy makes the plugin the essential piece of technology. Once the plugin exists, it creates a de facto standard. Bloggers code their pages to match what the plugin expects; so do bookmarking services. Neither bloggers nor bookmarking services need to abandon their current techniques; they can just add support for the plugin. It might or might not catch on like wildfire, but I don’t see it doing any harm.

Any coders out there interested in cleaning up some Web 2.0 litter?

ICANN HAS CHEEZBURGER.PL

Expiration Date I give it 4 stars

7 Comments

This joke will be funny to an extremely small number of people. I would say possibly zero, except that I find it hilarious.

Click the image for a larger version.

You Don’t Love Me Yet I give it 2 stars

This is, unquestionably, a Tim Powers novel. That means that it features a moderately large cast, ranged along a continuum from the mostly heroic to the wholly villainous, a slightly under-motivated romance, a lead character who makes some serious early mistakes that nearly get him killed and leave him in a difficult predicament, and some stunning reinterpretations of the world as we know it in terms of the supernatural.

This time around, the conceit is that the bums who wander the streets of Los Angeles are in fact ghosts who have taken on substance, and that the real drug scene in L.A. involves inhaling ghosts for a rush of their memories. Here as in his other novels, Powers takes his conceits seriously, spinning out a wealth of subplots and details in a demented and yet utterly believable fashion. Addicts attract ghosts with palindromes and bottle them for later use? Sure. Thomas Edison invented a device for talking to ghosts? Of course. My only caveat is that the ending is a bit more of a rolling stop than a bang.

Quite a step down from Lethem’s spectacular Fortress of Solitude. I suppose it’s a novel about the lives and loves of an ultra-indie band in L.A. But there’s also some bizarre installation art an a kidnapped kangaroo. The farce feels forced, and characters’ attraction to each other should be motivated by something more convincing than the narrator’s say-so.

PrawfsBlawg And Me

The Virtues of Moderation, Version 0.1

Another reason things have been a little quiet here is that I’m guest-blogging over at PrawfsBlawg, the most frequently misspelled blog in all of legal academia.

The Children of Húrin I give it 4 stars

3 Comments

I’ve put my slides from my presentation at the Commons Theory Workshop online. This was my first serious experiment in giving a presentation without bullet points. I was strongly influenced by Matt Haughey’s stunning Making Money Blogging presentation. (He cites Beyond Bullet Points as the source of his style, but I found it unhelpfully rigid. Better just to look at Matt’s presentation and ask yourself why it works.) For art, I used pictures from Flickr, mostly those under Creative Commons licenses. I hope soon to write up my experiences in clearing the photo permissions.

Be warned first that the file is 7.8 megabytes because of all the pretty pictures, and second that due to my being a blockhead and leaving my video dongle at home, I wound up not having my notes in front of me as I gave the presentation, so that the words on the screen bear only a distant familial resemblance to what I actually said.

Revising the draft paper to which the presentation pertains is one of my projects for the summer. The paper itself is still only in private alpha release, but if you’re intrigued like to be added to the alpha test group, please squirt me an email and I’d be happy to send it along.

May 16, 2007

Planes, Trains, und Automobiles

The tale of Túrin Turambar, told more briefly in The Silmarillion, is a tragic epic in the old-fashioned Germanic tradition. Think of the Siegfried components of Wagner’s Ring Cycle, but with the gods well offstage. As with much of the Silmarillion, it doesn’t read well if you expect a narrative with modern pacing, economy of plot, or dialogue. It succeeds quite well as Tolkien probably intended it: a convincing imitation of a fragment from an enormous and partially lost corpus of myth and history. Someone ought to turn it into an opera.

May 11, 2007

From the Annals of Bad Architecture

I’ve just returned from a Commons Theory Workshop at the Max Planck Institute for the Study of Collective Goods. It was great, and so was Bonn, but I just flew in from Germany, and boy are my arms tired. Yesterday, I traveled on two trains, a monorail, two taxis, two subways, an airplane, and a bus. I woke up just now with no clear idea of what time it was or where I was. Realizing that it was “morning” at “home” was a very pleasant surprise.

May 2, 2007

I saw the following design travesties while looking at apartments this week:

Of a carpeted apartment, “The current occupant is a smoker. Don’t worry; we fully clean all the apartments before new tenants move in.”
A ninth-floor apartment with eastern exposure, overlooking a train yard, and beyond that, the Hudson. The apartment has large glass exterior windows in the living room. The bedroom, further inside, opens onto the living room with a pair of French (i.e. mostly glass) doors.
Being told not to worry about the guy sleeping on a futon in the living room.
A duplex with the living room and kitchen upstairs on the entry level, and the bedroom downstairs on the garden level. There’s a bathroom on each floor. The bathtub is in the upstairs one.
A building-wide wireless network; no Ethernet jacks.
Of an apartment listed as two-bedroom, “There’s a second door there because the state of New Jersey says that it can’t be a bedroom unless it has a window, so the door is there to make it a den.”
French doors opening out onto a two-foot-wide strip of grass surrounding the building. The strip is fenced about with a mostly-open ironwork lattice; the sidewalk is about a foot or two below the strip.

If you can see what’s wrong with these pictures, then you’re doing better than the owners.

A Cheer for KSR?

May 1, 2007