Russ Cox: Regular Expression Matching Can Be Simple And Fast

This short paper by Russ Cox exemplifies everything I believe about theory. It takes a tool that programmers use all the time—regular expressions, which are the basis of search and replace—and shows how the standard implementation of them can fail badly at many tasks. It then walks through some basic computer science theory to derive a different implementation of regular expressions, one that performs well all the time.

This is a theoretical pearl: the mathematics are simple, well-understood, and provably correct. But it’s also profoundly practical: once you write out the theory carefully, the program follows as a matter of course. The formalism is the program. Sadly, people who ought to know better still get this one wrong.

I write law review articles and legal documents now, rather than programs and computer science papers. But the lesson is the same. Good theory is useful.

For more information, here’s Russ’s overview page, and here’s his follow-up with equally elegant implementation details. I had the pleasure of working with him years ago; he’s also an all-round nice guy and the only Free Electron I know.

The article is very interesting. I will have to take it in slowly, it is a long time since I had any direct involvement in anything of this sort. I am an artist but GEB 25 years ago was and still is very important to me.

“The formalism is the program” is very very true in all languages. Also most people do not get that the reverse us equally true: the program is the form. In art, this failure of understanding has resulted in a lot of art that is either invisible or, if visible, mindless. The internet and computers reminds me a lot of a story by Calvino in a volume entitled Cosmicomics - ‘The Spiral’. Search engines also look increasingly like a character called Ant Hilary in GEB.