The Perl 6 design team met by phone on 20 February 2008. Larry, Allison, Patrick, Jerry, Will, Jesse, and chromatic attended.
Patrick:
- did a few small things in Rakudo over this week
- nothing spectacular or amazing
- will do the Parrot release later this evening
- expect no problems
- headed to FOSDEM in Belgium tomorrow to talk about Perl 6
- I should have some hacking time on airplanes
- very little chance of being interrupted
Larry:
- other than having the flu...
- doing various spec things
- documented what identifier extensions are
- straightened out the little mess in role composition as to the difference between role private attributes and the composed class's private attributes
- something declared with
has
is intended to be used generically within the class
- for a role private attribute, you declare it with
my
- goes along with the use of
my
of giving private attributes (or methods at least) to classes
- if you declare unary or zero-arity functions with a prototype in Perl 5, that changes the grammar
- that's no longer the case in Perl 6
- they're still parsed as list operators
- you can define them that way if you want, but you have to do it explicitly
Patrick:
- that's a tremendous amount of sanity
Larry:
- some of us get there late
Patrick:
- some of us never get there
Larry:
- there's a big hole as to how backslash behaves in the interpolation spec
- what if you backslash something that isn't going to interpolate?
- the rules are similar to Perl 5, but more generically of course
- single-quotes assume you want to leave backslashes in
- double-quotes assume that all backslashes are meaningful, or they're in error
- working on how sig space interacts with
**
on a separator
- fixing up some of the spec there
- came up in IRC
- the big thing this week is working on a program called
gimme5
- chewing gum and bailing wire
- instead of spitting out Pugs code, it spits out Perl 5 code
- can now translate STD.pm to Perl 5 and run it under
strict
and warnings
- translated cursor.pm to Perl 5
- have the longest token scanner working for small, uncomplicated rules -- just alternatives
- need to revamp how longest-token matching works for more complicated rules
- the match is essentially over a list of alternatives
- given a rule that expects a term, it's zero or more prefix operators followed by a noun
- zero-or-more implies that the longest token at that point has to be the union of what prefix and noun matches
- it doesn't handle embedded alternations yet either
- the next step is doing what I could not make work under Pugs
- take the current match state and extract the match object to show to the user
- the current match state continuation has all of that information in it scattered among a bunch of backlinks
- it's not the form in which the user wants to access it
- unless the subscript lookups did some sort of tie-based behavior
- I don't want to do that
- we can just matchify things lazily when we know we need that for the user
Allison:
- finally done with my insane conference run
- no travel until the first week of March
c:
- doing a lot of work closing bugs
- working on open tickets
- closed a lot of them
- fixed a lot of little things for the release
- think I fixed one Tcl segfault
- working on the Lua GC bug now
- fixed up a lot of little things for the release
Jerry:
- doing more work on Rakudo
- found a weird bug that Patrick thought was GC, but it's not
- I did find a GC bug earlier today though and submitted a ticket
- did some other fixups for Parrot working on the release
- hope to work on the PDD 17 branch soon
- still have a Windows segfault there
- building on Linux so I can help
- started looking at gettext so we can internationalize Parrot
- I'll check something in for the config after the release
- after that, it's just a matter of putting in some macros and starting the conversion
Allison:
- the Linux kernel has some good hooks for i18n
- everything gets hidden behind a layer of abstraction, so you never directly print error text
Will:
- did a bunch of non-user-visible commits
- coding standard updates
- have a new contributor, Stephen Weeks
- he's contributed to several languages
Jerry:
- it feels like it's time to optimize PGE
- is that correct?
Patrick:
- there are lots of little improvements
- biggest improvement is probably the longest token match
c:
- can we avoid recursive descent with that?
Patrick:
- not entirely
- but we can avoid recursing into subrules that can't possibly match
c:
- I'm all for pruning trees
Patrick:
- longest token matching isn't trivial
- as Larry will tell you
- PGE needs to be able to attach attributes to subroutines
- I need to be able to ask a rule (or build a DFA) such that we can associate longest tokens with rules
- ask a rule "What does your DFA look like?"
- sounds like an attribute on a sub
- I haven't been able to put together a clean way to do that in Parrot
- we have subs with attributes or properties
- but once you take them down to being PIR
- you need a subroutine that attaches all of the properties once you load that subroutine
Larry:
- I'm not doing it that way
- I have an argument that's the state of the predetermination
- there's an artificial state,
?
, that's "tell me your DFA"
- instead of caching that in the sub/method itself, there's a per-language cache
- if two different languages call into that rule, they can keep separate caches of their longest token matchers
Patrick:
- you have a way of calling a method that says "Give me your DFA" instead of "Do a match"?
Larry:
- yes
- you don't have to duplicate the dispatch
- you get the right one
- we don't try to duplicate the dispatch mechanism
- we just tell it to do something different than do the match
Patrick:
- the grammar holds that little bit of DFA
- keeps track of how they combine together
Larry:
- when you ask for the DFA for the particular rule, it asks its subrules
- each of those, the autolexer automatically knows whether it's calculated that
- it's already in the cache
- it reuses that bit then
- reincorporates that into a larger pattern
- off it goes
- the tricky thing is making sure that that cache is per-language
- Pugs doesn't do that now
- if you try to cache that in the sub as a property, you'd have to keep track of which language used it
- they have different lexers
- that's some of the motivation for simplifying the language tweaking things
- don't want the language to tweak for every unary or zero-array function
- I didn't want to cache duplicate lexers?
Patrick:
- part B of my answer is get some sort of profiling available in Parrot
- see what subs get called and where we spend our time
- part C is writing the Capture PMC in C
- that's waiting on the PMC changes
Jerry:
- we have a profile core in Parrot,
-p
Will:
- it's based on the C level
Jerry:
- you want subroutines, not ops?
Patrick:
- yes
- making a native Capture PMC could be a big win
- it just redispatches to arrays and hashes
- that deserves to be its own beast
Larry:
- do you maintain that match object on the fly as a mutable object, and back things out as you backtrack?
- my passed-around match state is cloned, immutable objects
- my continuation system just uses the cloned snapshot
- it may or may not be faster
- I know where I need to reconstruct that mutable match object that the user sees
- that's another possible optimization/pessimization
Patrick:
- the match object is a convenient place to keep track of backtrack state
- backing out is basically a delete optimization
- I could optimize the backtracking
- function is more important than speed for what I've done so far
- another big win is getting strings implemented under Parrot such that we're not using UTF-8
- that's slow
- because of ICU and Unicode support, we're stuck with UTF-8 at the moment
Larry:
- I've done enough of that in my life
Patrick:
- as far as the code it generates, I'm sure people can come up with improvements there
- the code itself does a lot of optimizations
- profiling seems to be the big win
c: