Code generation and stone soup

masak on 2010-01-22T16:41:30

I don't know what kept me away from generating code for so long. Fear and prejudice, perhaps.

I've been trying it the last few days, and I have two things to say. First, it's like learning to program all over again. Remember that sense of power from the early days, when just picking up coding? "Hey, I can program this piece of code to do whatever I want." Well, guess what? That feeling comes back when one starts down on the path to madn... erm, to code generation. Only it's stronger. "Hey, I can program this piece of code to program that piece of code to do whatever it wants!" I think I've just discovered meta-hubris. Most likely, I'm not the first to do so.

Second, there's a flip-side to the feeling of power. That other feeling is how you feel when you knit your brows and wish that your neurons would line up a bit better so you could think more clearly about the problem at hand. Who would have thought, that feeling is also stronger when you're suddenly writing two different, entwined and related programs at the same time, in the same file. In my case, the knitted brows turn into an empty stare and a jaw left slackly agape, as I sit there wishing that I was better at context-switching between runloops.

Honestly, I think I expected eval to be the source of much programmer confusion, but I have to confess that it seems I underestimated the vistas it opens up when you buy into the idea of generating exactly the piece of code you need for the task (from an AST, say), and then eval it into a closure. That's what the back end of a compiler ends up doing, so maybe I shouldn't be so surprised that it's a versatile technique.

Lately, I've been in the business of squeezing every drop of juice out of the already implemented control flow constructs already implemented in Rakudo. I'm writing a p6regex→p6 compiler, you see. (Yes, that's a rather crazy notion; thanks for asking.) Along the way, I've often felt the need for not-yet-implemented control flow. This has led me to this hope-inducing maxim:

Every type of control flow in programming languages is just convenient sugar for if statements and while loops.

ifs and whiles are the stone soup to which all the rest of our control flow can be added as seasoning. ifs let you conditionally skip ahead in code, and whiles allow you to conditionally skip back. That's all you need.

Here are some examples.

  • Switch statements are just sugar for chained if/elsif/else statements. Even Perl 6's given/when constructs.
  • The variants next, last and redo, either with or without a label to affect a less-than-innermost loop, can be desugared to sad boolean-ish variables, plus some if statements to appropriately regulate the expression of the code inside the loop. (Yes, go ahead and twitch just thinking of it. That sugar is there for a reason.)
  • Subroutines and subroutine calls can be simulated with the appropriate use of a switch statement and an explicit call stack stored in an array variable.
  • Even exceptions, or more generally continuations, can be desugared in this way. As soon as you have total control of the call stack, you're free to save and rewind to some previous state to your heart's content.

Aside from the switch statements and unlabeled next etc, which already work very well in Rakudo, I've been doing the whole list of desugarings in GGE (the regex compiler). The part with the continuations was especially fun. I needed them for backtracking, at least as long as the compiler was only an interpreter.

But then, during a fruitful discussion with diakopter++, I was told how to emulate (delimited) gotos with a switch and a loop. The idea is quite obvious in retrospect: just keep the current 'label' in a variable, and switch on it in each iteration. Presto! I should have thought of that. I don't even need to flee to PIR any more.

I took the idea and generalized it to delimited gosubs: instead of keeping the current label in a scalar, keep it at the top of a stack. Define macro-like constructs to push to (local-branch) and pop from (local-return) the stack. Suddenly I don't need continuations as much.

Result: this. We send in the regex /<[a..b]> | <[b..e]>/ on the top line, along with the target string c to match on. The program generates an AST, an anonymous subroutine which executes the regex in atomic Perl 6 operations, and finally a match object which indeed finds c to be a match.

Here's a similar but slightly more involved example. And here's one doing captures and backreferences inside a quantified non-capturing group. Isn't that exquisite? (Ok, bad choice of word. Sorry.)

As I said, I wrote most of with a feeling of being not just in over my head, but of being in over my head twice. I'm still a bit surprised it works. The runtime compilation seems to introduce a bit of a speed penalty, but (1) it's a one-time cost, since you can re-use the regex object, and (2) I told you it would be slow.

The code-generating work still resides only in a local branch on my computer. I'll push it to master as soon as I'm done bringing GGE back to its former capabilities. (Update 2010-01-24: Done, and done.)

Code writing code. What a concept!


the functional version

hdp on 2010-01-23T02:28:08

while is just sugar for recursion.

Re:the functional version

masak on 2010-01-23T11:00:59

Aye. Had I needed to emulate while, I might have gone down that route. Better have that tail-call optimizer ready, though, or longer loops will blow the stack.

copy and paste

ChrisDolan on 2010-01-23T02:30:30

Code generation is OK if it's transient -- passed straight to a compiler and run. If instead it's written to disk as a template, then it's just formalized copy-and-paste, with all of the maintenance downsides.

The biggest drawback to runtime code generation is obscurity. It's opaque to static analysis and smat editors, and to most humans too.

Re:copy and paste

masak on 2010-01-23T11:06:16

The biggest drawback to runtime code generation is obscurity. It's opaque to static analysis and smat editors, and to most humans too.

Indeed. A static analyzer would have to be really smart to figure out that I'm collecting together little pieces of code as I traverse the AST...

Oh, and there seems to be another cardinal law involved in code generation: whatever happens, don't ever expect both your programs to look nicely indented anymore.

Re:copy and paste

ChrisDolan on 2010-01-23T14:42:26

In the worst case, you can use templates for the generated code. That pulls the odd indentation out of the main code.

Re:copy and paste

masak on 2010-01-23T18:28:45

I might give it a try. It does solve the templating issue, but it introduces distance between the templates and the place(s) where they're used. It's a bit of a lesser-of-two evils situation.

Re:copy and paste

ChrisDolan on 2010-01-23T20:48:45

True. Java is actually lucky in this area: it's trivial to embed an arbitrary file in a Jar file right next to the .class that uses it.

Congratulations:

Aristotle on 2010-01-23T12:00:51

You have discovered the criteria for Turing completeness. :-)

Re:Congratulations:

masak on 2010-01-23T18:24:42

Well yes, but not just that. I'm using the property of universality to bend a burgeoning language to my will. It's taking the general stance described in my blog post "Attitude" (that most bugs or missing features have a workaround), and applying it to control flow.

Maybe the point I'm proving is mainly for myself, but if I can make GGE behave like PGE (only slower), I've shown something about the capabilities of Rakudo.