Bootstrapping Grammars? "oh, I had that early on" - pmichaud

mugwumpjism on 2006-04-18T22:13:55

Yesterday I ended up pestering(full log) Patrick Michaud.

After a small clarification about why PGE is so cool, and what it means to be a shift/reduce parser (summary: switch between state machine / regular parsing and recursive descent with backtracking opportunistically), we started talking about the relationship between PGE (the parrot-based rules engine) and lrep (the Perl 6 to Perl 5 compiler).

Recently Flavio Glock has been working on a set of "bootstrapping" rules for generating a rules parser. That is, a set of Perl 6 rules that define the grammar for Perl 6 rules. That might sound strange, but consider that gcc is written in C, just as Haskell is written in Haskell. When the Perl 6 rules base is augmented with code blocks on each rule that generate an AST for the code being parsed, and something catches the generated AST at the end and converts it to some kind of bytecode or even machine code, you have yourself a compiler. This is what Flavio and others are aiming for with lrep.

It turns out that Patrick already had something like this a long time ago - there's a handcrafted parser (in PIR) that can parse a set of grammar rules and a operator precedence table and compile to a similar (but less efficient) generated parser - though it is missing the AST generation and subsequent stages to make it fully bootstrapping.

Those that are familiar with the pugs source will notice a striking structural, but not syntactical, similarity between the Haskell source to pugs and the Perl 6 grammars themselves. For example, compare src/Pugs/Prim.hs with the operator precedence table rules listed above.

The large remaining task for not one, but two releasable Perl 6's, one based on Perl 5 and the other on parrot, is writing the grammars and emitters for lrep and the PGE-based perl 6.

Fortunately, this task has alread been prototyped, almost to the extent of the completion of the specification of the Perl 6 language, by the pugs project. But pugs' grammars are written in Haskell and parsec - not Perl 6 grammars. Once converted to Perl 6 rules, they are in principle portable across all three Perl 6 ports.

All the pieces are there - we now just need (human) translators to translate the rules we already have into Perl 6. I see light at the end of the tunnel!