Day 233: Reports from ICFP!

autrijus on 2005-09-28T21:28:18

The main ICFP conference session concluded today, so I'm finally getting a tiny slice of time to write down what I've been doing.

I enjoyed handing out Perl6/Timeline T-shirts to various lambdafolks, including bpierce (author of TaPL), spj (author of GHC), as well as several cool hackers from #haskell. At any time in the conference hall, there was always someone with a Pugs shirt in sight, leading to many interesting discussions and opportunities to join forces.

I spent much time hacking with the GHC core team (spj and simonmar, also known as "Simon and Simon"), to get my current #1 feature request -- GADT with record types -- into GHC. Among Haskell people, GADTs are all the rage this year; see luqui's notes for a short introduction from Perl6's perspective.

During the YAPC::NA 2005 Toronto hackathon, I presented the then-brand-new PIL1 data structure to @Larry, using GHC's then-brand-new GADT syntax:

data PIL a where
    PCode       :: SubType -> [TParam] -> PIL [Stmt] -> PIL Expression
    PBind       :: [PIL LValue] -> PIL Expression -> PIL LValue
    PAssign     :: [PIL LValue] -> PIL Expression -> PIL LValue
    -- ...etc...

luqui looked at it and asked quite innocently: "Why are the arguments not labelled, such as 'lhs' for 'PAssign'?" -- To which lwall said, half-jokingly: "Because autrijus is a lazy bastard."

However, the real reason was because GHC's extended Haskell syntax only allowed positional argument types for GADT. To name them with the record syntax, we'd need to go back to the vanilla data types, and forgo the ability to encode type information using the constructor's return type (as demonstrated in PIL Expression and PIL LValue above).

A month later, I diligently re-encoded the structure using vanilla record types, so we can export them into hash-based Perl (and JSON) data structures, instead of array-based ones. This significantly improved the readability of iblech's PIL2JS code, but the Haskell data declaration became 3 times longer, because each GADT variant needs to be re-encoded into vanilla data types.

Ever since then, I was hoping that GHC may one day remedy this by allowing GADTs and record syntax to play together. When I met spj in ICFP, he was gladly surprised that I'm eager to hack GHC to make it happen. After bouncing several design ideas, we settled on this syntax:

data PIL a where
    PCode   { typ :: SubType, params :: [TParam], body :: PIL [Stmt] }
        :: PIL Expression
    PBind   { lhs :: [PIL LValue], rhs :: PIL Expression }
        :: PIL LValue
    PAssign { lhs :: [PIL LValue], rhs :: PIL Expression }
        :: PIL LValue
    -- ...etc...

The neat thing is that PBind and PAssign can share the lhs and rhs labels, as long as their return value PIL LValue stays the same.

During the reception, spj was kind enough to guide me through the GHC core to unify the two data type definitions together back. The next day, simonmar walked me through the Happy parser to get the new syntax recognized, and today spj showed me how the GHC typechecker works. All there's left is installing the labels as record selectors, and the upcoming GHC 6.6 will have a nice feature to fit Pugs's needs. Hacking GHC is evidently not as difficult as I feared! :-)

Talking about GADTs, I was also involved with the crazy project of encoding the Darcs patch algebra into the GADT type system, letting the compiler catch entire classes of common logic errors in darcs, bringing us dagerously close to the ideal of proof-carrying code.

Two days ago, when droundy told me this idea, I was very excited, started prototyping it, then promptly ran into errors on existential types. Heffalump took my laptop and shuffled it with Igloo, arriving at something that seemed to compile, which evolved into an embedded DSL for patches. After today's conference sessions, a dozen darcs hackers gathered together, learned about the recent developments of the patch theory, then worked on the GADT encoding. The hacking session was quite successful -- we didn't run into any showstoppers, much to everybody's surprise -- and I visiolized the whiteboard into this diagram.

On the Pugs backend front, I learned from simonmar the existence of the ILX emitter library, hiding in a forgotten corner of GHC source tree. This enables us to emit pre-CLR-2.0 MSIL code that could run on Mono, under a 3-clause BSD license. Targetting CLR suddenly looks more attractive...

I also learned how to generate native machine code straight from C--, using the GHC API distributed as part of Lemmih's ghc-api Cabal package.

Speaking of Cabal, I learned from SyntaxNinja about the design of the Cabal/Hackage system, which is the Module::Build/CPAN equivalent for Haskell. We shared our experiences, and he helped me setting up the basic Cabal build rules for Pugs. With Cabal support, Pugs's Makefile.PL would only need to care about the Perl parts in the source tree, and we can finally get a libPugs to link with pugscc and other programs -- Inline::GHC is definitely on the radar, and the idea of using Perl6 to script the Haskell IDE (hIDE) is also gaining some traction.

Compiling with continuations, another pet interest of mine, also got a boost from ICFP. The Generalized Stack Inspection talk outlined how to use vanilla try/catch mechanism to implement full continuations, which is much more efficient than the regular trampolining approach as used in Pugs's JavaScript backend, since only people who actually use full continuations need to pay the cost for catch. Their technique operates on the ANF form, which seems to be a good low-level intermediate form for Pugs.

Interestingly, the paper went as far as saying that Parrot might have made a suboptimal choice in paying the full cost of going full-CPS, since we could emulate full continuations using exception handling efficiently. They gave concrete implementations in CLR and JVM -- we'll see if this result can carry over to Perl5 and JavaScript runtimes.

The invited talk of the first day tackled impredicative type inferencing, a long-standing problem in inferring types for a rich system such as Perl6. I had much trouble grokking the Wobbly Type idea of spj's, so I was much delighted to see that fpottier gave a very intuitive treatment of splitting the user-annotation-driven phase and the complete inferencing phase apart.

Oh, the ICFP contest. Haskell won again, which came at no surprise, but the Dylan Hacker team made another strong showing. From the brief investigation, I think Dylan is even closer to the "Perl 6 without advertisement" mark than Ruby, largely due to its incremental soft typing system, where one can start rapid prototyping with full dynamism, then generate more efficient and robust C code by adding type annotations.

Tomorrow is the Curry workshop, then the Haskell workshop on the day after. After that I'll get two weeks of Pugs hacking time. I can barely wait to apply those new techniques, tools and thoughts into Pugs!

Wow.....i mean.....wow

eric256 on 2005-09-28T23:03:19

I would just like to say thanks. I'm glad that there are people out there who understand this stuff AND want to spend there time building pugs and perl6. I generaly consider myself fairly smart but when i read this compiler theory stuff my eyes just glaze over and my head spins.

So thanks for taking your time to build pugs and hears hoping you continue to use your power for good! ;)

Re: Wow.....i mean.....wow

grantm on 2005-09-29T00:15:26

Ditto.