The main ICFP conference session concluded today, so I'm finally getting a tiny slice of time to write down what I've been doing.
I enjoyed handing out Perl6/Timeline T-shirts to various lambdafolks, including bpierce (author of TaPL), spj (author of GHC), as well as several cool hackers from #haskell. At any time in the conference hall, there was always someone with a Pugs shirt in sight, leading to many interesting discussions and opportunities to join forces.
I spent much time hacking with the GHC core team (spj and simonmar, also known as "Simon and Simon"), to get my current #1 feature request -- GADT with record types -- into GHC. Among Haskell people, GADTs are all the rage this year; see luqui's notes for a short introduction from Perl6's perspective.
During the YAPC::NA 2005 Toronto hackathon, I presented the then-brand-new PIL1 data structure to @Larry, using GHC's then-brand-new GADT syntax:
data PIL a where PCode :: SubType -> [TParam] -> PIL [Stmt] -> PIL Expression PBind :: [PIL LValue] -> PIL Expression -> PIL LValue PAssign :: [PIL LValue] -> PIL Expression -> PIL LValue -- ...etc...luqui looked at it and asked quite innocently: "Why are the arguments not labelled, such as 'lhs' for 'PAssign'?" -- To which lwall said, half-jokingly: "Because autrijus is a lazy bastard."
However, the real reason was because GHC's extended Haskell syntax only allowed
positional argument types for GADT. To name them with the record
syntax, we'd need to go back to the vanilla data types, and forgo the
ability to encode type information using the constructor's return type (as
demonstrated in PIL Expression
and PIL LValue
above).
A month later, I diligently re-encoded the structure using vanilla record types, so we can export them into hash-based Perl (and JSON) data structures, instead of array-based ones. This significantly improved the readability of iblech's PIL2JS code, but the Haskell data declaration became 3 times longer, because each GADT variant needs to be re-encoded into vanilla data types.
Ever since then, I was hoping that GHC may one day remedy this by allowing GADTs and record syntax to play together. When I met spj in ICFP, he was gladly surprised that I'm eager to hack GHC to make it happen. After bouncing several design ideas, we settled on this syntax:
data PIL a where PCode { typ :: SubType, params :: [TParam], body :: PIL [Stmt] } :: PIL Expression PBind { lhs :: [PIL LValue], rhs :: PIL Expression } :: PIL LValue PAssign { lhs :: [PIL LValue], rhs :: PIL Expression } :: PIL LValue -- ...etc...The neat thing is that
PBind
and PAssign
can share
the lhs
and rhs
labels, as long as their return
value PIL LValue
stays the same.
During the reception, spj was kind enough to guide me through the GHC core to unify the two data type definitions together back. The next day, simonmar walked me through the Happy parser to get the new syntax recognized, and today spj showed me how the GHC typechecker works. All there's left is installing the labels as record selectors, and the upcoming GHC 6.6 will have a nice feature to fit Pugs's needs. Hacking GHC is evidently not as difficult as I feared! :-)
Talking about GADTs, I was also involved with the crazy project of encoding the Darcs patch algebra into the GADT type system, letting the compiler catch entire classes of common logic errors in darcs, bringing us dagerously close to the ideal of proof-carrying code.
Two days ago, when droundy told me this idea, I was very excited, started prototyping it, then promptly ran into errors on existential types. Heffalump took my laptop and shuffled it with Igloo, arriving at something that seemed to compile, which evolved into an embedded DSL for patches. After today's conference sessions, a dozen darcs hackers gathered together, learned about the recent developments of the patch theory, then worked on the GADT encoding. The hacking session was quite successful -- we didn't run into any showstoppers, much to everybody's surprise -- and I visiolized the whiteboard into this diagram.
On the Pugs backend front, I learned from simonmar the existence of the ILX emitter library, hiding in a forgotten corner of GHC source tree. This enables us to emit pre-CLR-2.0 MSIL code that could run on Mono, under a 3-clause BSD license. Targetting CLR suddenly looks more attractive...
I also learned how to generate native machine code straight from C--, using the GHC API distributed as part of Lemmih's ghc-api Cabal package.
Speaking of Cabal, I learned from SyntaxNinja about the design of the Cabal/Hackage system, which is the Module::Build/CPAN equivalent for Haskell. We shared our experiences, and he helped me setting up the basic Cabal build rules for Pugs. With Cabal support, Pugs's Makefile.PL would only need to care about the Perl parts in the source tree, and we can finally get a libPugs to link with pugscc and other programs -- Inline::GHC is definitely on the radar, and the idea of using Perl6 to script the Haskell IDE (hIDE) is also gaining some traction.
Compiling with continuations, another pet interest of mine, also got a boost
from ICFP. The Generalized
Stack Inspection talk outlined how to use vanilla try/catch
mechanism to implement full continuations, which is much more efficient than
the regular trampolining approach as used in Pugs's JavaScript backend, since
only people who actually use full continuations need to pay the cost for
catch
. Their technique operates on the ANF form, which seems to
be a good low-level intermediate form for Pugs.
Interestingly, the paper went as far as saying that Parrot might have made a suboptimal choice in paying the full cost of going full-CPS, since we could emulate full continuations using exception handling efficiently. They gave concrete implementations in CLR and JVM -- we'll see if this result can carry over to Perl5 and JavaScript runtimes.
The invited talk of the first day tackled impredicative type inferencing, a long-standing problem in inferring types for a rich system such as Perl6. I had much trouble grokking the Wobbly Type idea of spj's, so I was much delighted to see that fpottier gave a very intuitive treatment of splitting the user-annotation-driven phase and the complete inferencing phase apart.
Oh, the ICFP contest. Haskell won again, which came at no surprise, but the Dylan Hacker team made another strong showing. From the brief investigation, I think Dylan is even closer to the "Perl 6 without advertisement" mark than Ruby, largely due to its incremental soft typing system, where one can start rapid prototyping with full dynamism, then generate more efficient and robust C code by adding type annotations.
Tomorrow is the Curry workshop, then the Haskell workshop on the day after. After that I'll get two weeks of Pugs hacking time. I can barely wait to apply those new techniques, tools and thoughts into Pugs!