Perl 6 Design Minutes for 20 February 2008

brian_d_foy on 2008-02-24T00:36:00

The Perl 6 design team met by phone on 20 February 2008. Larry, Allison, Patrick, Jerry, Will, Jesse, and chromatic attended.

Patrick:

did a few small things in Rakudo over this week
nothing spectacular or amazing
will do the Parrot release later this evening
expect no problems
headed to FOSDEM in Belgium tomorrow to talk about Perl 6
I should have some hacking time on airplanes
very little chance of being interrupted

Larry:

other than having the flu...
doing various spec things
documented what identifier extensions are
straightened out the little mess in role composition as to the difference between role private attributes and the composed class's private attributes
something declared with has is intended to be used generically within the class
for a role private attribute, you declare it with my
goes along with the use of my of giving private attributes (or methods at least) to classes
if you declare unary or zero-arity functions with a prototype in Perl 5, that changes the grammar
that's no longer the case in Perl 6
they're still parsed as list operators
you can define them that way if you want, but you have to do it explicitly

Patrick:

that's a tremendous amount of sanity

Larry:

some of us get there late

Patrick:

some of us never get there

Larry:

there's a big hole as to how backslash behaves in the interpolation spec
what if you backslash something that isn't going to interpolate?
the rules are similar to Perl 5, but more generically of course
single-quotes assume you want to leave backslashes in
double-quotes assume that all backslashes are meaningful, or they're in error
working on how sig space interacts with ** on a separator
fixing up some of the spec there
came up in IRC
the big thing this week is working on a program called gimme5
chewing gum and bailing wire
instead of spitting out Pugs code, it spits out Perl 5 code
can now translate STD.pm to Perl 5 and run it under strict and warnings
translated cursor.pm to Perl 5
have the longest token scanner working for small, uncomplicated rules -- just alternatives
need to revamp how longest-token matching works for more complicated rules
the match is essentially over a list of alternatives
given a rule that expects a term, it's zero or more prefix operators followed by a noun
zero-or-more implies that the longest token at that point has to be the union of what prefix and noun matches
it doesn't handle embedded alternations yet either
the next step is doing what I could not make work under Pugs
take the current match state and extract the match object to show to the user
the current match state continuation has all of that information in it scattered among a bunch of backlinks
it's not the form in which the user wants to access it
unless the subscript lookups did some sort of tie-based behavior
I don't want to do that
we can just matchify things lazily when we know we need that for the user

Allison:

finally done with my insane conference run
no travel until the first week of March

doing a lot of work closing bugs
working on open tickets
closed a lot of them
fixed a lot of little things for the release
think I fixed one Tcl segfault
working on the Lua GC bug now
fixed up a lot of little things for the release

Jerry:

doing more work on Rakudo
found a weird bug that Patrick thought was GC, but it's not
I did find a GC bug earlier today though and submitted a ticket
did some other fixups for Parrot working on the release
hope to work on the PDD 17 branch soon
still have a Windows segfault there
building on Linux so I can help
started looking at gettext so we can internationalize Parrot
I'll check something in for the config after the release
after that, it's just a matter of putting in some macros and starting the conversion

Allison:

the Linux kernel has some good hooks for i18n
everything gets hidden behind a layer of abstraction, so you never directly print error text

Will:

did a bunch of non-user-visible commits
coding standard updates
have a new contributor, Stephen Weeks
he's contributed to several languages

Jerry:

it feels like it's time to optimize PGE
is that correct?

Patrick:

there are lots of little improvements
biggest improvement is probably the longest token match

can we avoid recursive descent with that?

Patrick:

not entirely
but we can avoid recursing into subrules that can't possibly match

I'm all for pruning trees

Patrick:

longest token matching isn't trivial
as Larry will tell you
PGE needs to be able to attach attributes to subroutines
I need to be able to ask a rule (or build a DFA) such that we can associate longest tokens with rules
ask a rule "What does your DFA look like?"
sounds like an attribute on a sub
I haven't been able to put together a clean way to do that in Parrot
we have subs with attributes or properties
but once you take them down to being PIR
you need a subroutine that attaches all of the properties once you load that subroutine

Larry:

I'm not doing it that way
I have an argument that's the state of the predetermination
there's an artificial state, ?, that's "tell me your DFA"
instead of caching that in the sub/method itself, there's a per-language cache
if two different languages call into that rule, they can keep separate caches of their longest token matchers

Patrick:

you have a way of calling a method that says "Give me your DFA" instead of "Do a match"?

Larry:

yes
you don't have to duplicate the dispatch
you get the right one
we don't try to duplicate the dispatch mechanism
we just tell it to do something different than do the match

Patrick:

the grammar holds that little bit of DFA
keeps track of how they combine together

Larry:

when you ask for the DFA for the particular rule, it asks its subrules
each of those, the autolexer automatically knows whether it's calculated that
it's already in the cache
it reuses that bit then
reincorporates that into a larger pattern
off it goes
the tricky thing is making sure that that cache is per-language
Pugs doesn't do that now
if you try to cache that in the sub as a property, you'd have to keep track of which language used it
they have different lexers
that's some of the motivation for simplifying the language tweaking things
don't want the language to tweak for every unary or zero-array function
I didn't want to cache duplicate lexers?

Patrick:

part B of my answer is get some sort of profiling available in Parrot
see what subs get called and where we spend our time
part C is writing the Capture PMC in C
that's waiting on the PMC changes

Jerry:

we have a profile core in Parrot, -p

Will:

it's based on the C level

Jerry:

you want subroutines, not ops?

Patrick:

yes
making a native Capture PMC could be a big win
it just redispatches to arrays and hashes
that deserves to be its own beast

Larry:

do you maintain that match object on the fly as a mutable object, and back things out as you backtrack?
my passed-around match state is cloned, immutable objects
my continuation system just uses the cloned snapshot
it may or may not be faster
I know where I need to reconstruct that mutable match object that the user sees
that's another possible optimization/pessimization

Patrick:

the match object is a convenient place to keep track of backtrack state
backing out is basically a delete optimization
I could optimize the backtracking
function is more important than speed for what I've done so far
another big win is getting strings implemented under Parrot such that we're not using UTF-8
that's slow
because of ICU and Unicode support, we're stuck with UTF-8 at the moment

Larry:

I've done enough of that in my life

Patrick:

as far as the code it generates, I'm sure people can come up with improvements there
the code itself does a lot of optimizations
profiling seems to be the big win

I have some ideas there