Core rule calculus

luqui on 2005-12-20T04:26:22

Lambda calculus is a beautiful compiler backend. When I wrote my Toy language a month or so ago, I used this core data structure:

data AST
    = Lambda  VarName AST
    | Apply   AST AST
    | Var     VarName
    | Literal Value


That's it. Those four nodes and I could do anything. I wrote an interpreter for this language in half an hour. I wrote a type inferencer in an hour, scrapped it, and took another hour to write another type inferencer. That's the beauty of a small core calculus. The optimial calculus is hard to come up with, but once you have it, everything else is cake. That's why PIL is so important.

Recently I've been porting PGE to Perl 6, so we can have rules in the Javascript and Perl 5 backends. Perl 6 rules are another fairly simple language. However, Patrick Michaud's calculus has 15 nodes. That's huge! I mean, it's not huge if you're taking the approach of executing the parse tree, as is done by Perl 5, P6C (the initial take of Perl 6 written in Perl 5 for Parrot), and Pugs's first runtime. But my toy language was a powerful experience for me, so I'd like to come up with a simple calculus for rules.

The following three nodes get you to context-free (displayed in Haskell for its consice description of data structures, even though I'm actually writing it in Perl 6):

data Rule
    = Literal String
    | Union   Rule Rule
    | Concat  Rule Rule


Anyone with experience in formal languages will say "That's not context-free! That's not even regular!" Oh, but I neglected to mention: I allow infinite, recursive structures. It's easy to see how to build an inifite structure for the Kleene star. A similar technique will allow you to model every context-free language.

After playing with this structure a little bit, I have concluded that allowing infinite structures is a bad idea. It means that in order to do any processing that isn't straight-up execution, you have to use the graph variants of whatever tree algorithms you were going to use. Those can get complicated, and some algorithms even become impossible. It's like the waterbed theory: I'm pushing too much complexity out of the core calculus.

If we disallow recursive structures, then we need to add Kleene star to get back to regular. To get to context free from here, we basically need symbolic reference. But I'm not going to do that. I have an objection to symbolic reference, and that is that you have to add all sorts of machinery to get it to work in a powerful way: Lexical pads, full qualification, etc. What I'm looking for is a lambda-calculusesque fix, which allows you to build recursion without symbolic lookup or infinite structures.

Maybe that means that I just mimic the lambda calculus and do pretty much what Parsec does: build a value out of functions that can be used to match strings. That gives you the full power of the lambda calculus, and all the research that has been done on it, to match strings. And Parsec can do predictive parsing, which is something I've wanted. The only thing it can't do is bottom-up parsing, which is also something I've wanted. But bottom-up parsing is more restricted than Perl 6 rules, so maybe that belongs at a earlier stage in processing.