Parsing with Haskell

ziggy on 2005-02-25T16:41:40

At $WORK, we've been talking about the language-of-the-year [1] since about October, if not earlier. I had planned on looking into OCaml in 2005, while a one of my colleagues decided to look into Haskell instead.

In January, I was pretty busy with $LIFE, so I didn't get much started. Hearing how wonderful Haskell is, and watching Autrijus working on Pugs from afar, I decided to switch tracks and get into Haskell.

As far as programming languages go, Haskell is very different from anything else I've seen or used. Approaching it with a Lisp/Scheme mindset helps, but isn't sufficient -- there's still a period of adjustment.

Interestingly, we're looking at writing a parser at $WORK, and we're actively avoiding the standard lex / yacc two step for a couple of reasons: (1) it needs to be fast, (2) it needs to be threadsafe, (3) it needs to be fast, (4) it needs to produce helpful error messages, (5) we need the option to use load more than one parser at a time (some yacc's don't do that very easily), and (6) did I mention it needs to be both fast and threadsafe? ;-)

ML / OCaml / Haskell are good languages for writing parsers and interpreters, so that's an obvious path to examine. The engineering on the Glasgow Haskell Compiler is downright impressive, and battle-scared enough that we're not squeamish adding it to the toolchain. (Among other things, it can compile to .c / .o files on the standard platforms.) And if you're going to be doing parsing in Haskell, your first stop is probably going to be Parsec.

Sure, learning Haskell takes some effort and adjustment. But it's worthwhile. Looking over the paper on Parsec, it condenses a year's undergraduate curriculum on parsing and compiler theory down to about 50 pages (including index). And you only need to read the about first half of it to get a working understanding of compiler theory -- which is much better than the alternatives.

[1]: In The Pragmatic Programmer, Dave and Andy recommend learning a new programming language every year, just to stay fresh. Best case, you'll learn a great new tool. Worst case, you'll gain a new perspective on coding practices to use and avoid.