Day 142: Hackathon day #3

autrijus on 2005-06-26T07:58:03

Subversion repository is finally back, and I'm committing things we've been hacking on in the past two days. luqui and nothingmuch hacked on reifying the PIL AST as Perl 6 objects (so we get true macro support for free), and a codegen written in Perl 6 that takes PIL and emits Perl 5. Stevan worked on a Perl 5 metamodel that runs Perl 6 roles and classes from Perl 5 space, so we can use Perl 5 as our host VM (in addition to Parrot and Haskell); nothingmuch further joked about a PIL codegen that targets Forth (but turns out he was entirely not serious); Patrick implemented builtin rules for PGE, and put qualified and inheritable+lexical grammar/subrules support in place. Ingy worked on his new Perldoc framework that provides unified, comment-driven, haddock/javadoc-like metadata and macro-friendly way to generate Kwid and POD and translate between them. lwall continued to hack through mad segfaults on PDD/MAD_SKILLS, refactoring Perl 5's toke.c to be more informative.

Allison arrived today and we debated for nearly an hour on the implementation strategy. This Pugs Hackathon work was focused on having the Perl 6 compiler written in Perl 6, that translates the parse tree into PIL, then from PIL to either a Parrot syntax tree that emits PIR, or to Perl 5, Mono or even Javascript. However, that means to compile a newer version of Perl 6 compiler, we need to have an older version of Perl 6 around, first via the non-Parrot side of Pugs, and then in the form of a Perl6.pbc runtime. This is just like how the Mono C# compiler is written in C#, or GCC itself written in C, or how PyPy is written in Python.

Allison prefers instead to see the production Perl 6 compiler and related tools to be written entirely in PIR or other non-Perl6 Parrot languages, so that we can compile any version of Perl 6 without having access to previous versions of Perl 6, and she suspects that a parser/compiler/emitter written in PIR would be easier to write and maintain than the same toolchain written in Perl 6. The parser and emitter tools will then be reusable by other languages (eg. PHP) that want to target Parrot, because PHP folks would prefer to use the compiler suite written in PIR than one written in either Perl 6 or PHP. That ties Perl 6 to PIR, but one can persumably link in libparrot in Mono or JVM to run the PIR code via their foreign call interface. This is similar to how the Perl 5 runtime is written in XS/C, and just like a (hypothetical) Mono-targetting compiler written in the .NET/IL high-level assembly language.

Thus, we have two possible implementation strategies that will evolve separately. People will, well, use whichever one that actually works. :-)

Now... revelation time!

  • To use a perl 5 module, put a perl5: in the front of the module name:
      use perl5:DBI;
    

    Extending this metaphor, to use a python module:

      use python:Zope;
    
  • When calling a function, the unary splat * takes an aggregate, or reference to an aggregate, and flatten them out on the invocation list. Unary splat on hash arguments flattens it out as pairs for named bindings; splat on scalars deref it to find an array/hash reference; for code and non-reference-to-aggreate scalars it's a no-op.
  • &prefix:<int> now always mean the same thing as &int. In the symbol table it's all stored in the ``prefix'' category; &int is just a short name way for looking it up -- it's just sugar, so you can't rebind it differently.
  • Here is a more clarified role hierarchy:
      Any | Object  | Item | ...pretty much everything else goes here...
                    | Pair
                    | Junction
          | int
          | str
          | num
    
  • Constrained types in MMD position, as well as value-based MMDs, are not resolved in the type-distance phase, but compile into a huge given/when loop that accepts the first alternative. So this:
      multi sub foo (3) { ... }
      multi sub foo (2..10) { ... }
    

    really means:

      multi sub foo ($x where { $_ ~~ 3 }) { ... }
      multi sub foo ($x where { $_ ~~ 2..10 }) { ... }
    

    which compiles two different long names:

      # use introspection to get the constraints
      &foo
      &foo
    

    which really means this, which occurs after the type-based MMD tiebreaking phase:

      given $x {
          when 3 { &foo.goto }
          when 2..10 { &foo.goto }
      }
    

    in the type-based phase, any duplicates in MMD is rejected as ambiguous; but in the value-based phase, the first conforming one wins.

  • Closure composers like anonymous sub, class and module always trumps hash dereferences:
        sub{...}
        module{...}
        class{...}
    
  • The do form is now taking a single statement (that may still be a block); what it does is turning the statement into an expression form, immediately evaluating it when the left hand side demands a value.
      my $val = do use CGI;   # same as
      my $val = BEGIN { use CGI };
    
      # This assigns 4 to $foo
      my $foo = do given 3 {
          when 3 { 4 }
      };
    
  • A closure form of but is desugared into a do given block that eliminates the need of returning $_ explicitly. So those two forms are equivalent:
      my $foo = Cls.new but {
          .attr = 1;
      };
    
      my $foo = do given Cls.new {
          .attr = 1;
          $_;
      };
    
  • The anonymous class is allowed to both is and does any class/roles; it will be composed in compile time into an anonymous (but unique) class, same way as an anonymous closures remembers its original place of definition:
      role Foo {...}
      class Boo is Baz {...}
    
      (class{ does Foo; is Boo }).new(1);
    
  • The is Foo and does Bar declarations inside class body is always lifted up as class traits and executed at class composition time.
  • There's another pseudopackage, OUR:: and symbol table form %OUR:: that contains the symbols in your current package namespace.
  • trust is lexical: It controls accessor generation for all my, our and has forms in its scope. Inside a class body, the my $.x and our $.x always generates public accessors on the spot:
      class Foo {
          trusts Bar; # sees the accessor methods in the scope
          { # some inner scope
              my $.x; # This creates accessors by inserting these two lines
              # die "Duplicate accessor" if %OUR::<&x>;
              # our &x := method () is rw { $.x };
    
              my $:y; # This create accessors by  inserting these two lines
              # my &:y := method () is rw { $:y };
              # $?SELF.trust_access.push(&:y); # adds to Bar -- $?SELF is class object
          }
          trusts Baz; # this is always an no-op because there's nothing below
      }
    
      class Pie is Foo { }
    
      Pie.x = 5;      # lvalue method - writes back to lexical $.x
      say Bar.x;      # 5 - shared by Bar and Pie
    

    The twigil . and : controls the generated accessor's scope (our and my respectively). The scope of the variable itself is orthogonal to the accessor.

  • In order to pass by read-only reference, we need a way to bind a container into a new name, but say that it is read-only while the original is read-write. This is a contradiction, since binding by definition binds a container to two names. Still, we have to do it.

    In parameter lists, the is constant default trait on parameter variables is not really acting on their containers; it creates a transparent container on top of an existing container. It is ``transparent'' because it autoderef for everything except for write-type STORE/PUSH/SPLICE methods; all read methods like FETCH are passed to the underlying container. Even .ref and .does calls are passed through to the underlying container, but the .tied call does get you the wrapped implementation object.

    The upshot is that these are now errors:

      sub foo ($x) is rw { $x }
      my $a;
      foo($a) = 4;    # runtime error - assign to constant
    
      sub constref ($x) { \$x }
      my $a;
      my $r = constref($a);
      $$r = 4;        # runtime error - assign to constant
    

    To get a normal reference, use the is rw trait on the parameters.


    Refactoring the unfactorable?

    brentdax on 2005-06-26T08:46:15

    lwall continued to hack through mad segfaults on PDD/MAD_SKILLS, refactoring Perl 5's toke.c to be more informative.
    Larry is probably the only man on earth who can even think about changing toke.c without breaking half the test suite.

    I say...

    pdcawley on 2005-06-26T14:27:14

    Working macros? Symbol table introspection?

    I think this is the week I start working on the debugger.