Java parser in Parrot/PGE

ChrisDolan on 2009-01-09T05:02:20

My favorite part of Perl 6 is the new grammar syntax. Over the last couple of days, I translated a Java source code grammar from antlr to PGE. After about 4-5 hours of work, I now have a Perl 6 grammar that can parse all of the .java files in the OpenJDK (the Java 7 source code). Well, that may be a lie. It's still crunching at about 5-10 seconds per file so it will be a while before I know if its really true.

Admittedly most of the credit goes to the authors of the antlr grammar I adapted, but this also says good things about the Perl 6 regex implementation in Parrot.

The things that bit me hardest were:

  1. negated classes (PGE doesn't understand "<-[abc]>" so I had to make the inner part a separate token)
  2. antlr allows character classes with outside of any character group syntax (antlr: "'0'..'9'", perl: "<[0..9]>")
  3. longest token on integer vs. float (I had to change the antlr grammar to put float ahead of integer)
  4. whitespace (I cribbed from the Pipp implementation)


Gah!!!

JonathanWorthington on 2009-01-10T01:41:25

So now I'm tempted to take this and write a bunch of actions to produce PAST and see how quickly I can make it compile some Java programs... Yes, I'm a compiler-writing junkie.

Re:Gah!!!

ChrisDolan on 2009-01-10T04:44:28

My thought exactly, but I'm a couple years behind you in familiarity with Parrot. I'm happy to give you commit to my SVN, or move the grammar to another repository.

It turns out I still have some grammar issues to work out. I'm failing about 3% of the JDK source code, mostly due to longest-token assumptions in the antlr grammar. But I expect I'll be to 100% by next week. Plus, the grammar could use some refactoring to use the optable. I'm hoping to look at that.

Re:Gah!!!

Aristotle on 2009-01-11T21:57:14

Is there a 12-step program?

Re:Gah!!!

chromatic on 2009-01-12T06:49:00

If it takes twelve steps, we need to make PCT even easier to use.