More parrot languages - java

rurban on 2008-09-07T09:16:08

Working over the plan to make parrot and its languages installable (and do a make with an already installed parrot), I fixed and tested all of the included languages. Looking deeper at dotnet, which converts a .NET .exe or .dll assembly to a parrot library (pir or pbc), I saw the similarities to java. See http://www.jnthn.net/papers/2006-cam-net2pir-dissertation.pdf> for Jonathan Worthington's paper describing it.
So I thought, why not try to rewrite (i.e. copy & paste + tags-query-replace) dotnet to jvm.

Both bytecodes look very similar, the .NET bytecode has a few extra specialities, both can be converted from the stack-based vm to a register vm via some perl5 SRM compiler, which is currently used in dotnet and WMLScript.

Sun's Hotspot compiler source which is available at
http://openjdk.java.net shows that Sun took a similar path with the bytecode table description. In the perl5 bytecode compiler we have an opcode table with references to c and perl code for special ops and types (. In parrot we have a simple ini-style list of ops, with arguments and return type description in the target format (which is PIR) and some simple source template to expand the intermediate stack and temp. locations. With Hotspot Sun invented an adl format ("Architecture Description Language") to describe the ops. This also has a cost attribute for each op which enables an optimizing compiler, if static or JIT. See "hotspot\src\share\vm\adlc\Doc\Syntax.doc" and "hotspot\src\cpu\i486\vm\i486.ad"

With a class2pbc (JVM to Parrot) converter we could use all the existing java libraries out there. However, Jonathan's net2pbc dotnet converter currently works only for about 50% of the .NET assemblies.

Currenly with jvm I am stuck at opcode "iinc" 0x84 which increments the local integer variable on the current thread-local frame within the stack-based vm ("increment an int lexical"). Our SRM "compiler" takes stack arguments and converts it to our registers. However I'm not sure how it deals with stack temporaries, so-called stack frame variables. And besides those stack frame vars the jvm also uses temporary int variables heavily, which are usually stored in registers if possible.

Note that usually closures and class methods store their lexical vars on the C stack right above its code, so that a return to an uplevel function/method automatically cleans up the stack with its code and vars, which is much faster than the perl5 pad layout, where the lexical vars are kept in seperate arrays. It could be that the java vm keeps its stack frame lexicals on the so-called "C stack" as C and lisp do it, or on the heap as perl5 does it with its PAD arrays.