Perl 6 Design Minutes for 20 February 2008

chromatic on 2008-02-22T23:14:15

The Perl 6 design team met by phone on 20 February 2008. Larry, Allison, Patrick, Jerry, Will, Jesse, and chromatic attended.

Patrick:

  • did a few small things in Rakudo over this week
  • nothing spectacular or amazing
  • will do the Parrot release later this evening
  • expect no problems
  • headed to FOSDEM in Belgium tomorrow to talk about Perl 6
  • I should have some hacking time on airplanes
  • very little chance of being interrupted

Larry:

  • other than having the flu...
  • doing various spec things
  • documented what identifier extensions are
  • straightened out the little mess in role composition as to the difference between role private attributes and the composed class's private attributes
  • something declared with has is intended to be used generically within the class
  • for a role private attribute, you declare it with my
  • goes along with the use of my of giving private attributes (or methods at least) to classes
  • if you declare unary or zero-arity functions with a prototype in Perl 5, that changes the grammar
  • that's no longer the case in Perl 6
  • they're still parsed as list operators
  • you can define them that way if you want, but you have to do it explicitly

Patrick:

  • that's a tremendous amount of sanity

Larry:

  • some of us get there late

Patrick:

  • some of us never get there

Larry:

  • there's a big hole as to how backslash behaves in the interpolation spec
  • what if you backslash something that isn't going to interpolate?
  • the rules are similar to Perl 5, but more generically of course
  • single-quotes assume you want to leave backslashes in
  • double-quotes assume that all backslashes are meaningful, or they're in error
  • working on how sig space interacts with ** on a separator
  • fixing up some of the spec there
  • came up in IRC
  • the big thing this week is working on a program called gimme5
  • chewing gum and bailing wire
  • instead of spitting out Pugs code, it spits out Perl 5 code
  • can now translate STD.pm to Perl 5 and run it under strict and warnings
  • translated cursor.pm to Perl 5
  • have the longest token scanner working for small, uncomplicated rules -- just alternatives
  • need to revamp how longest-token matching works for more complicated rules
  • the match is essentially over a list of alternatives
  • given a rule that expects a term, it's zero or more prefix operators followed by a noun
  • zero-or-more implies that the longest token at that point has to be the union of what prefix and noun matches
  • it doesn't handle embedded alternations yet either
  • the next step is doing what I could not make work under Pugs
  • take the current match state and extract the match object to show to the user
  • the current match state continuation has all of that information in it scattered among a bunch of backlinks
  • it's not the form in which the user wants to access it
  • unless the subscript lookups did some sort of tie-based behavior
  • I don't want to do that
  • we can just matchify things lazily when we know we need that for the user

Allison:

  • finally done with my insane conference run
  • no travel until the first week of March

c:

  • doing a lot of work closing bugs
  • working on open tickets
  • closed a lot of them
  • fixed a lot of little things for the release
  • think I fixed one Tcl segfault
  • working on the Lua GC bug now
  • fixed up a lot of little things for the release

Jerry:

  • doing more work on Rakudo
  • found a weird bug that Patrick thought was GC, but it's not
  • I did find a GC bug earlier today though and submitted a ticket
  • did some other fixups for Parrot working on the release
  • hope to work on the PDD 17 branch soon
  • still have a Windows segfault there
  • building on Linux so I can help
  • started looking at gettext so we can internationalize Parrot
  • I'll check something in for the config after the release
  • after that, it's just a matter of putting in some macros and starting the conversion

Allison:

  • the Linux kernel has some good hooks for i18n
  • everything gets hidden behind a layer of abstraction, so you never directly print error text

Will:

  • did a bunch of non-user-visible commits
  • coding standard updates
  • have a new contributor, Stephen Weeks
  • he's contributed to several languages

Jerry:

  • it feels like it's time to optimize PGE
  • is that correct?

Patrick:

  • there are lots of little improvements
  • biggest improvement is probably the longest token match

c:

  • can we avoid recursive descent with that?

Patrick:

  • not entirely
  • but we can avoid recursing into subrules that can't possibly match

c:

  • I'm all for pruning trees

Patrick:

  • longest token matching isn't trivial
  • as Larry will tell you
  • PGE needs to be able to attach attributes to subroutines
  • I need to be able to ask a rule (or build a DFA) such that we can associate longest tokens with rules
  • ask a rule "What does your DFA look like?"
  • sounds like an attribute on a sub
  • I haven't been able to put together a clean way to do that in Parrot
  • we have subs with attributes or properties
  • but once you take them down to being PIR
  • you need a subroutine that attaches all of the properties once you load that subroutine

Larry:

  • I'm not doing it that way
  • I have an argument that's the state of the predetermination
  • there's an artificial state, ?, that's "tell me your DFA"
  • instead of caching that in the sub/method itself, there's a per-language cache
  • if two different languages call into that rule, they can keep separate caches of their longest token matchers

Patrick:

  • you have a way of calling a method that says "Give me your DFA" instead of "Do a match"?

Larry:

  • yes
  • you don't have to duplicate the dispatch
  • you get the right one
  • we don't try to duplicate the dispatch mechanism
  • we just tell it to do something different than do the match

Patrick:

  • the grammar holds that little bit of DFA
  • keeps track of how they combine together

Larry:

  • when you ask for the DFA for the particular rule, it asks its subrules
  • each of those, the autolexer automatically knows whether it's calculated that
  • it's already in the cache
  • it reuses that bit then
  • reincorporates that into a larger pattern
  • off it goes
  • the tricky thing is making sure that that cache is per-language
  • Pugs doesn't do that now
  • if you try to cache that in the sub as a property, you'd have to keep track of which language used it
  • they have different lexers
  • that's some of the motivation for simplifying the language tweaking things
  • don't want the language to tweak for every unary or zero-array function
  • I didn't want to cache duplicate lexers?

Patrick:

  • part B of my answer is get some sort of profiling available in Parrot
  • see what subs get called and where we spend our time
  • part C is writing the Capture PMC in C
  • that's waiting on the PMC changes

Jerry:

  • we have a profile core in Parrot, -p

Will:

  • it's based on the C level

Jerry:

  • you want subroutines, not ops?

Patrick:

  • yes
  • making a native Capture PMC could be a big win
  • it just redispatches to arrays and hashes
  • that deserves to be its own beast

Larry:

  • do you maintain that match object on the fly as a mutable object, and back things out as you backtrack?
  • my passed-around match state is cloned, immutable objects
  • my continuation system just uses the cloned snapshot
  • it may or may not be faster
  • I know where I need to reconstruct that mutable match object that the user sees
  • that's another possible optimization/pessimization

Patrick:

  • the match object is a convenient place to keep track of backtrack state
  • backing out is basically a delete optimization
  • I could optimize the backtracking
  • function is more important than speed for what I've done so far
  • another big win is getting strings implemented under Parrot such that we're not using UTF-8
  • that's slow
  • because of ICU and Unicode support, we're stuck with UTF-8 at the moment

Larry:

  • I've done enough of that in my life

Patrick:

  • as far as the code it generates, I'm sure people can come up with improvements there
  • the code itself does a lot of optimizations
  • profiling seems to be the big win

c:

  • I have some ideas there