Java parser in Parrot/PGE

ChrisDolan on 2009-01-09T05:02:20

My favorite part of Perl 6 is the new grammar syntax. Over the last couple of days, I translated a Java source code grammar from antlr to PGE. After about 4-5 hours of work, I now have a Perl 6 grammar that can parse all of the .java files in the OpenJDK (the Java 7 source code). Well, that may be a lie. It's still crunching at about 5-10 seconds per file so it will be a while before I know if its really true.

Admittedly most of the credit goes to the authors of the antlr grammar I adapted, but this also says good things about the Perl 6 regex implementation in Parrot.

The things that bit me hardest were:

  1. negated classes (PGE doesn't understand "<-[abc]>" so I had to make the inner part a separate token)
  2. antlr allows character classes with outside of any character group syntax (antlr: "'0'..'9'", perl: "<[0..9]>")
  3. longest token on integer vs. float (I had to change the antlr grammar to put float ahead of integer)
  4. whitespace (I cribbed from the Pipp implementation)


JonathanWorthington on 2009-01-10T01:41:25

So now I'm tempted to take this and write a bunch of actions to produce PAST and see how quickly I can make it compile some Java programs... Yes, I'm a compiler-writing junkie.


ChrisDolan on 2009-01-10T04:44:28

My thought exactly, but I'm a couple years behind you in familiarity with Parrot. I'm happy to give you commit to my SVN, or move the grammar to another repository.

It turns out I still have some grammar issues to work out. I'm failing about 3% of the JDK source code, mostly due to longest-token assumptions in the antlr grammar. But I expect I'll be to 100% by next week. Plus, the grammar could use some refactoring to use the optable. I'm hoping to look at that.


Aristotle on 2009-01-11T21:57:14

Is there a 12-step program?


chromatic on 2009-01-12T06:49:00

If it takes twelve steps, we need to make PCT even easier to use.