Bratislava, 2008-11-08 16:05 MET
=head1 NAME
Need help with the perl compiler, emit C or JIT, blabla
=head1 Preparation
open L
=head1 Contents
I'll present very briefly about 5% of the internals, will debug
some real-life problem and you have enough time to task deeper
questions.
* History
* Current Status
* Compilation
* The op tree
* B::Bytecode
* B::C
* Using the compiler
* Debugging a problem
* Ideas, pro and contra B
-----------
=head1 History
The "perl compiler" modules B
written 96-97 by Malcolm Beattie, Oxford.
in core since Alpha 5, perl5.002
Out of core since 5.9.5 (5.10)
Now at CPAN as B::C maintained by me, rurban,
at F
--------
With alpha3 (1997) it could compile almost the
whole perl test suite for all 3 compilers.
With 5.10 it cannot even compile a regexp (due to the 5.10
rewrite and SV changes) and its 21 internal tests
(due to advanced features since 5.002: AUTOLOAD, our, ...).
perl -MO=C,-otest.c test.pl
=> test.c
gcc test.c
--------
=head1 Current Status
5.002 up to 5.8.9 works "ok", most tests pass, but not all.
For 5.10 properly calling and saving a lexical context needs some
help, most likely from eastern hackers. I have best experiences
with russians.
For 5.10/5.11 calling and constructing a regexp for B::C
needs help. From demerque in Frankfurt, or someone else who
knows how to call PM_SETRE and CALLREGCOMP in XS in 5.10.
When these two problems are solved, I can release it as
B-C-1.05 and replace B::C from 5.8.
When the testsuite with some advanced tests will pass, we can
start using the compiler and bytecode features. Probably put the
bytecode stuff back into core, because we need plc/pmc support
and the ByteLoader part builtin.
5.6 and earlier will keep using the core B::C modules,
as its internal structures changed too much.
--------
=head1 "Compilation"
perl has an internal compiler, i.e. a parser (perly.y) reads the
source lines and compiles it to a so-called op tree, a tree of
simple operators (ops), which are internal pp_() functions.
See opcode.pl or perloptreeguts.pod
As with XS all internal perl pp_ functions take no arguments,
all arguments are expected to be on the "perl stack", which
is a special heap area, not the CPU stack. (pp for "push/pop")
The op tree represents the program code, but a program also needs
the data, the SV's, AV's and HV's. The arguments for the ops are
typically pointers to those SV's (SVREF) or lexicals (on PADs)
or direct SV's.
perl is not too much functional, so there are seldom pointers to
ops used as args to ops, mostly lexicals and SV's.
In L
When executing a program, perl compiles ("parses and constructs")
the optree and then simply runs linearly through the optree (a
linear list now) from the beginning to the end.
In the "perl compiler", the B backend is just the XS
representation of the optree as perl objects, you can use perl
methods to read from the various OP structs.
The perl compiler consists of various B modules to convert from
those B objects, representing the ops, to bytecode or C code.
----------
=head1 The op tree
See L
Similar to the perl internal variables, the SV's, the B
"$a + $b * $c"
is compiled to (in C syntax, but really in memory)
newBINOP(OP_ADD, flags,
newSVREF($a),
newBINOP(OP_MULTIPLY, flags,
newSVREF($b),
newSVREF($c)
)
)
Two BINOP's for ADD and MULTIPLY take two args (BINOP), and of
those two args are the op for a SVREF (pointer to the SV for $a,
$b and $c) and the OP_MULTIPLY.
This parse-tree is recursive and looks like nested LISP code.
The internal compiler (not B::C) runs in three passes over the
perl code. The various passes contain also a "peephole"
optimizer, which optimizes this recursive op tree and in the end
it is ensured that we can linearly run through the tree by simply
stepping through the C
The Walker:
int
Perl_runops_standard(pTHX)
{
dVAR;
while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
PERL_ASYNC_CHECK();
}
TAINT_NOT;
return 0;
}
=head1 B::Bytecode
Generate the optree from a binary .plc/.pmc file,
platform-compatible.
CROSS-PLATFORM PORTABILITY
For different endian-ness there are ByteLoader converters in effect.
Header entry: byteorder.
64int - 64all - 32int is portable. Header entry: ivsize
ITHREADS are unportable.
Needs much less opcodes (~100) than perl opcodes.pl,
all the pp_ functions (~400).
Just for every op, all the op flags (the struct fields)
and for every sv/av/hv type.
Assembler and disassembler roundtrips.
=head1 B::C
Similar to bytecode it generates the whole optree ("code") and
data in memory with XS functions, and then jumps into ENTER
via the main walker C
But it generates C code, which is statically compiled and linked
to libperl. Dynamic perl features are still dynamic, but
guaranteed static decisions can be optimized. => L
=head1 Using the compiler
perlcc test.pl
t/testplc.sh (see the .plc, .asm, .disasm files and
the roundtrips)
t/testc.sh 2
=head1 Debugging a problem
See STATUS
t/testc.sh 02
Debug a failure in the PREGCOMP call
Expand the preprocessor C macros to find
the actual failing calls.
gcc -E => .cee
Fix the line number from main() on for the gdb stepper.
Our main() is perl_init_aaaa() here.
Step to the problem and inspect it. gdb b perl_init_aaaa p
=head1 Ideas
The B modules can be used the read or change or transform the
perl optree - a perl program in the internal representation.
We might want to convert perl5 to various other formats, such as
native code (JIT), perl6 or PIR, but maybe also to java, LISP,
scheme, and compile this then to fast native and optimized code.
Other possible advanced ways are:
1. B
2. B
3. undump() and unexec
Some cool B modules are L
And as advanced modules L
--
rurban
__END__
Local Variables:
fill-column:65
End: