Phasers are a blast: FIRST and LAST

masak on 2010-07-15T16:13:40

I started thinking about the FIRST and LAST phasers the other day, thanks to moritz++. My attention was on how to implement them in Yapsi, and my conclusions were mostly SIC, but they can be converted to Perl 6 for public view.

For those who haven't kept up with the latest Perl 6 terminology, "phasers" are what we call those all-caps blocks which fire at different phases during program execution. Perl 5's perldoc perlmod simply calls them "specially named code blocks", but in Perl 6 it's been decided to call them "phasers".

So much for phasers. What do the FIRST and LAST phasers do? They don't exist in Perl 5. S04 describes them thus:

FIRST {...}       at loop initialization time, before any ENTER
 LAST {...}       at loop termination time, after any LEAVE

(There's a NEXT phasers too, which I'm not going to tackle today. The ENTER and LEAVE phasers are what they sound like; they trigger at block entrance and exit, respectively.)

Here's some code using these.

my @a = 1, 2, 3;
for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;
    LAST { say "LOL DONE" }
}

The code, when run, should print the following:

OH HAI
1
2
3
LOL DONE

(At the time of writing, no Perl 6 implementation implements the FIRST and LAST phasers yet.)

The goal of this post is transforming the phasers into code using more primitive constructs, but which still produces the above results. Oh, and it should work not only in this case, but in general.

Here's a first attempt. (Phaser-ful code to the left, rewritten code to the right.) It doesn't work.

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              say "OH HAI";
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              say "LOL DONE";

More exactly, it does produce the desired output, but it doesn't work in general; it fails when @a is empty:

my @a;                        my @a;
                              say "OH HAI";
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              say "LOL DONE";

This code would still produce "OH HAI\nLOL DONE\n", which is wrong, because there is no first and last iteration for the empty @a array.

Ok, we say. No worries; a bit more ad hoc, but we can detect for emptiness. No problem.

my @a;                        my @a;
                              my $HAS_ELEMS = ?@a;
                              if $HAS_ELEMS {
                                  say "OH HAI";
                              }
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              if $HAS_ELEMS {
                                  say "LOL DONE";
                              }

That works for an empty list, but it fails to work when the FIRST block accesses variables that only exist within the for loop:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $HAS_ELEMS = ?@a;
                              if $HAS_ELEMS {
                                  $x # BZZT PARSE ERROR
for @a -> $item {
    my $x;
    FIRST { $x = 42 }
    say $item, $x;
}

So. Back to the drawing-board. Two seemingly opposing forces constrain our problem: we need to put the rewritten FIRST block outside the for loop, because we only want it to execute once; but we also need to put it inside the for loop, so that it can have access to the same lexical environment. Is there a compromise somewhere in there?

Yes. We put the FIRST block inside the for loop, but then we keep track of whether we've already executed it once, with a special variable hidden in the surrounding scope:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $FIRST_PHASER_HAS_RUN = False;
for @a -> $item {             for @a -> $item {
    my $x;                        my $x;
                                  unless $FIRST_PHASER_HAS_RUN {
    FIRST { $x = 42 }                 $x = 42;
                                      $FIRST_PHASER_HAS_RUN = True;
                                  }
    say $item, $x;                say $item, $x;
}                             }

Now it all works. This is the general way to make the FIRST behave according to spec. In the presence of several loops within the same block, one can re-use the same variable for all of the loops, just resetting it before each one. Explicitly setting to False even the first time is quite important, in case someone ever implements the goto statement.

With the LAST phaser, we encounter exactly the same dilemma as with the FIRST loop. The LAST phaser has to be both inside and outside the block; inside because it has to have access to the loop block's variables, and outside because... well, because in general one doesn't know which iteration was the last one until it has already run.

At one point I had the idea to put the LAST block at the end of the loop block, checking the loop condition just before the placement of the LAST block, possibly saving it somewhere so it doesn't have to be re-evaluated. But the sad truth there's no realistic way to evaluate the loop condition from within the loop block; what if the expression contains a variable which is shadowed by another variable inside the loop block? There's just no way to make that fly.

The whole situation with the LAST block really looks hopeless... until one remembers about closures:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $LAST_PHASER;
                              my $LOOP_HAS_RUN = False;
for @a -> $item {             for @a -> $item {
    my $x = "LOL DONE";           my $x = "LOL DONE";
    LAST { say $x }               $LAST_PHASER = { say $x };
                                  $LOOP_HAS_RUN = True;
}                             }
                              if $LOOP_HAS_RUN {
                                  $LAST_PHASER();
                              }

So in every iteration, we save away a closure just in case that particular iteration turns out to be the last one. Then we execute the last value assigned to the closure, provided the loop ever run. Sneaky, huh?

So that works in the general case. Of course, a clever optimizer which can detect with certainty that the loop will run at least once and that neither phaser uses loop-specific lexicals is perfectly entitled to rewrite the FIRST and LAST phasers to our first attempt. But the above rewritings work in the general case.

In explaining this to a colleague, a case of possible confusion involving the FIRST phaser was uncovered:

for 1, 2, 3 {
    my $x = 42;
    FIRST { say $x }
}

One might perhaps expect this code to print "42\n", but in fact it prints "Any()". The reason is simple: whereas the lexical $x is reachable throughout the whole for loop, the assignment of 42 to it won't occur until after the FIRST block has executed. That's what FIRST blocks do, they execute first. Nevertheless, some people might expect assignments to be treated specially in some way, not counting as "real code" or whatever. But they are, and thus that's the result. In general, reading from freshly declared lexical variables in a FIRST block won't do you much good.

Lastly, there's this wording in S04:

FIRST, NEXT, and LAST are meaningful only within the lexical scope of a loop, and may occur only at the top level of such a loop block.

I read that as saying that these kinds of blocks should be illegal if they are found in a block which isn't a loop block. STD.pm6 doesn't enforce this yet; it probably should.


minor nit

jmm on 2010-07-15T19:57:02

You don't need a separate boolean to handle LAST - if the loop never ran, then there has never been an assignment of the LAST sub, so: if it is defined call it.

Re:minor nit

masak on 2010-07-15T22:35:11

You are, of course, completely correct. Moritz pointed this out as well. Not sure why I left the boolean in anyway. Esthetics, maybe. Or perhaps just absent-mindedness. I also considered defining it as {;} before the loop, and then running it afterwards no matter what. Saves us an if statement.

Of course, implementing Perl 6 on top of Perl 6 means I'm pretty much screwed speed-wise no matter how many micro-optimizations I find. I'm left having to optimizing for other things, such as getting all the cool phasers before the other implementations. :)

Name of Phasers

Lasse on 2010-07-16T07:33:09

Hi Carl, I do not know if this is the right forum for this. I like the phasers, they are really programmer friendly as many other novel Perl6 constructs. But I find the name of the phasers a bit missleading. FIRST and LAST are executed BEFORE and AFTER the first resp last iteration of the loop and is not directly associated with the logic of a loop iteration. Long time ago (in 360 assembler macros) I did similar loop constructs and used the words INITIALIZE, TERMINATE and NEXT and used Booleans FIRST and LAST flagging the first and last iteration in a loop. I always exceuted my INITIALIZE and TERMINATE routines nomatter if the loop was executed or not.

do FIRST not BEFORE

salva on 2010-07-16T08:12:34

Executing FIRST before anything in the loop makes it pretty uninteresting. It would be more useful if it were run conditionally but on its place.

About implementing it, how about maintaining a loop iteration counter in some internal variable as $?I.

The same applies for LAST, but how do you know you are running the last iteration until after the loop check has failed?

Re:do FIRST not BEFORE

Lasse on 2010-07-16T10:06:45

Hi Salva, I assume your post was to me, it made me look at SO4 abouth phasers WOW that's a lot. I done it again, talk/write first without thinking. My construct were more of documentary purpose and to tie code together. And since I did it mostly for myself I didn't have to take care of every edge case e.g. when an empty loop mattered I did the check in the 'phasers'. As for setting LAST just be one step ahead, it's not a big deal for finit resources. I had good use for my construct, but that was not even a pee in the pond compared with Perl6 phasers. Next time I have opinions about Perl6 I read the specs first. P.S. I have followed Perl6 development for the last 3-4 years and I'm VERY impressed with what you are creating. I hope to get some time to late this year to attempt to do something serious in Perl6.

Re:do FIRST not BEFORE

masak on 2010-07-16T10:54:17

Next time I have opinions about Perl6 I read the specs first.

Ka-ching! You've just earned 100 experience points, and may proceed to the next level of Perl 6. :)

Re:do FIRST not BEFORE

masak on 2010-07-16T10:52:08

Executing FIRST before anything in the loop makes it pretty uninteresting. It would be more useful if it were run conditionally but on its place.

...and that, to be clear, is how it's spec'd to work, and how I actually end up implementing it in the post.

The same applies for LAST, but how do you know you are running the last iteration until after the loop check has failed?

Also addressed in the blog post in whose comment thread you are commenting. :)

Re:do FIRST not BEFORE

salva on 2010-07-16T14:29:10

It seems I have not explained my opinion clearly, let me illustrate it with some code:

for 1..3 { say "enter"; FIRST { say "first run" } LAST { say "last run" } say "leave" }

I would consider FIRST and LAST to be interesting if that code printed enter, first run, leave, enter, leave, enter, last run, leave.

Re:do FIRST not BEFORE

masak on 2010-07-16T18:47:03

Then what you want is probably not a phaser. Happily, the "create this exact behavior with a home-made module" option is still available as a way out.

A simple way to do what you want without any syntax modifications to the language would be to iterate over .kv of the list, and then check the .value part in if statements in the loop block.

In general, it's not possible to know which iteration will become the last one until it's already finished. It is possible in your example, with a finite, statically introspectable literal list. But since looping works with iterators, the information required to answer the question "am I on the last iteration?" generally isn't available.

Re:do FIRST not BEFORE

Aristotle on 2010-07-18T20:47:53

for 1..3 {
    my $first; FIRST { $first = 1 }
    my  $last;  LAST {  $last = 1 }

    say "enter";
    if $first { say "first run" }
    if $last  { say "last run" }
    say "leave"
}

Re:do FIRST not BEFORE

Aristotle on 2010-07-18T20:49:07

Oops. That won’t work for LAST, though it does for FIRST. Hmm.

teach for a new trick

gfldex on 2010-07-16T14:51:35

If you asume that any block got FIRST and LAST but by default those fellows are empty function calls, you could move the condition to call FIRST and/or LAST out of the pointy block and into for.

The only condition you have left then would be if the list is empty. "for" has to check that anyway (at some point) so you don't really add anything.

I don't really like to have $LOOP_HAS_RUN = True; any time you call the pointy block.

I tried to implement it in javascript and ended up with the following.

window.super_for = function(it, code){
    var first = code.FIRST ? code.FIRST : function(){};
    var last = code.LAST ? code.LAST : function(){};

    var current = it();
    if (current !== iterable.done) {
    first();
    } else {
    return;
    }

    do {
    code(current);
    } while( (current = it()) !== iterable.done );

    last();
};

In perl6 I would try (if I would know how :) to transform:

for @list -> $a {
  FIRST { say "i haz a first!" }
  say "$a";
  LAST { say "i is done" }
}

into:

fancy_for(@list, $code, $code_first, $code_last);

You could default $code_first and $code_last to an empty block and let the optimizer remove those.

Re:teach for a new trick

masak on 2010-07-16T18:52:07

Your solution works fine when the FIRST and LAST does not access any lexical variables specific to the loop block. $a in your Perl 6 code at the end is such a variable. Sending $code_first and $code_last in as unrelated closures doesn't give those closures access to $a.

For a solution that does work for these cases, please re-read the blog post. :)