Apocalypse 5 on Regular Expressions Posted

Simon on 2002-06-04T22:40:28

The Apocalypses are Larry's explanation of the design of Perl 6. Apocalypse 5 deals with - or should I say, redesigns - regular expressions. Hang onto your hats, because this one is 24 pages long and will break a lot of your expectations...


On breakage of various types

Damian on 2002-06-04T23:27:39

24 pages? It's only that long because the "pages" on perl.com are so very much shorter than they've ever been before. Perhaps you're responding to user requests for that or trying to distribute your server load differently? But the implication that Apocalypse 5 is unprecedentedly huge is just plain wrong: in line count A5 is within a few percent of A4.

As for breaking expectations...we sure did! Assuming people expected regexes to continue to be an ad hoc accretion of vaguely incompatible syntaxes, obscure stand-alone features, and special cases. ;-)

I mean, did people really expect Larry wouldn't take this unique opportunity to drag Perl regexes completely free from the syntactic and semantic mire into which they've slowly been sinking?

Apocalypse 5 is Perl 6 in microcosm: throwing away the prototype and using the lessons learned to design a system that's integrated, rather than aggregated.

Of course, there will almost certainly be issues and problems we've overlooked. That's why we'll be relying so heavily on the community's constructive feedback over the next few weeks. But fear, uncertainty, doubt, and xenophobia aren't likely to be helpful in that regard.

KEKEK I HAXOR YR APOCLIPS!

TorgoX on 2002-06-05T03:03:58

24 pages? It's only that long because the "pages" on perl.com are so very much shorter than they've ever been before.

I do wish they had a "printable version" link that'd dump the whole thing as simple HTML all on a single page. But http://www.perl.com/pub/a/2002/06/04/apo5.html?page=all works.

Re:KEKEK I HAXOR YR APOCLIPS!

TorgoX on 2002-06-05T04:05:42

It seems there is a "printable version" link, but it's rendered as a wee little icon that's off the side of my screen. HTML is hard!

Re:On breakage of various types

Simon on 2002-06-05T04:31:46

But the implication that Apocalypse 5 is unprecedentedly huge is just plain wrong: in line count A5 is within a few percent of A4.

% wc -l a4.pod a5.pod
2175 a4.pod
2811 a5.pod

In word count, about 30% more. In byte code, about 40% more. And A4 wasn't small. Still, it depends on your definition of "a few"...

Re:On breakage of various types

Damian on 2002-06-05T05:51:53

In my view, line counts on marked-up source aren't particularly meaningful. I was reckoning by the counts on the visible text itself:


% lynx -dump 'http://www.perl.com/pub/a/2002/06/04/apo5.html?page=all' > a5.txt
% lynx -dump 'http://www.perl.com/pub/a/2002/01/15/apo4.html?page=all' > a4.txt
% wc a[45].txt
2518 16682 104463 a4.txt
2768 18534 119403 a5.txt


So in terms of readable content, A5 is 10% bigger by lines, 11% bigger by words, and 14% bigger by bytes. No big deal.

Re:On breakage of various types

jdavidb on 2002-06-05T12:56:37

Where can I get a5 in pod?

Apocalypses in pod

pne on 2002-06-10T11:39:47

It probably helps if you're an oreilly.com editor :)

Re:On breakage of various types

m2 on 2002-06-05T21:52:08

24 pages? It's only that long because the "pages" on perl.com are so very much shorter than they've ever been before.

Yes, that's completly out of touch with reality. The one page, printable version is 33 pages long on A4 paper...

See the Dark One's influence

autarch on 2002-06-04T23:42:26

I hope it does not escape people that an awful lot of the syntax looks like something from Parse::RecDescent. Coincidence? I think not!

Seriously, it all looks very cool. Simple regexes remain simple, while grammars becomes possible. I like it.

First Reaction

samtregar on 2002-06-05T00:06:53

Nirvana! Parse::RecDescent is dead and re-born in a new and glorious form. For the first time I'm actually anxious to use Perl 6.

Sure, all the changes will take time to get used to. But the result will be both cleaner and more powerful than what we have now.

-sam

Re:First Reaction

tzz on 2002-06-05T14:41:26

Dead? I doubt it. Parse::RecDescent could benefit from the new Perl 6 regex engine (both in internal implementation and in features passed to the user), but for writing grammars it is very different from what Perl 6 will offer. At least, that's my hope. Damian may have already started work on the :p-rd directive

:)

Re:First Reaction

Damian on 2002-06-06T07:33:44

Actually, having seen A5, you've seen the first draft of the new Parse::FastDescent syntax! Of course, P::FD will provide a large number of additional assertions (replicating P::RD's numerous handy directives), but the syntax and semantics will almost certainly be as close to A5 as possible.

That way, P::FD can continue P::RD's role as a test-bed for Perl 6 regex/grammar features, as well as providing a migration path from Perl 5 to Perl 6.

Re:First Reaction

tzz on 2002-06-06T13:37:49

I certainly hope backward compatibility is not completely lost, or at least we're given a way to port P::RD grammars to P::FD semi-automatically.

Re:First Reaction

Damian on 2002-06-06T21:28:06

I certainly hope backward compatibility is not completely lost...

I'm afraid so.

...or at least we're given a way to port P::RD grammars to P::FD semi-automatically.

Yes. One of the big tests of P::FD is whether I can build a P::RD metagrammar with it.

Oh shee-it

LunaticLeo on 2002-06-05T05:32:15

That is a big one to swallow.

I like A5, but it is dramatic. It will piss off alot of hum-buggers.

Mostly, it makes me wonder how long it will take to build perl6. Much of my apprehension is allieviated by the thought that perl6 will not show up in my lifetime :)

Re:Oh shee-it

Damian on 2002-06-05T06:02:43

I like A5, but it is dramatic. It will piss off alot of hum-buggers.

You really think so? I can't think of any significant way in which Larry's proposal isn't a vast improvement on what we have now: more readable, more consistent, better optimized for the common cases, more powerful. What do you think they'll object to? (That's not a rhetorical question: I'd really like to know.)

Mostly, it makes me wonder how long it will take to build perl6. Much of my apprehension is allieviated by the thought that perl6 will not show up in my lifetime :)

Wow, I sincerely hope you're wrong about that! I'm expecting to see a usable beta some time next year, and I'd hate to think of you shuffling off this mortal coil so soon! ;-)

Re:Oh shee-it

pudge on 2002-06-05T11:17:00

Well, if I were inclined to object, I would object about the fact that it is too different. I don't, in general, like different.

OK, that's an oversimplification, but I just woke up. In any event, though, I am far more interested in the Perl syntax than the Perl regex syntax. My primary concern with regexes is the learning curve and speed of execution. It's a very dissimilar situation to Perl syntax itself, to my mind: I mean, how many people are really in love with Perl regex syntax, really?

Re:Oh shee-it

Smylers on 2002-06-05T13:52:58

What do you think they'll object to?

Backwards compatibility with Perl 5 people can get used to, as they use less Perl 5. Incompatibility with egrep, vi, mod_rewrite, et al could be harder to overcome. They aren't all completely compatible at the moment, but at least things like character classes are similar. I can see people not liking having to learn a completely different regexp syntax just for Perl.

It also depends on how many other languages pick up Perl 6 regexes. It'd be bizarre to find Java and C++ coders disliking Perl because its regexes aren't what they're used to.

(Not me though — I think the new regexes are a great improvement.)

Smylers

Re:Oh shee-it

pne on 2002-06-06T08:35:51

Yeah... character classes were also number one on my list of "things that I'll miss knowing how to do in Perl6".

I know they'll still be there, but the syntax will be different -- and pretty much all other regex languages will still allow [aeiou], only Perl6 will require <[aeiou]> (or perhaps <vowel>?). So I won't be able to carry over my knowledge directly.

Re:Oh shee-it

buckaduck on 2002-06-06T14:14:47

I actually think that
<vowel>
might be pretty cool. But how would you handle "sometimes Y"?

I suppose if any language could do it, Perl could...

Re:Oh shee-it

ct on 2002-06-05T15:36:05

I haven't read A5 yet as perl.com has just taken a nice nosedive.

My reaction to the first four (though I usually need the Exegesis to make sense out of some bits :) has been, well, not negative so much as fearful.

It may be perl, but it's clear that it's radically different. While I don't have a huge stack of legacy code I'm concerned about, I've done perl long enough that when I think of a problem, I think of it in perl. Changing the syntax of the language on me now would be like someone announcing "OK, new rules for the English Language, instead of adverbs ending in -ly, they now end in -o-rama. Quick-o-rama!".

So for the first few Apoc's I kept thinking "Oh, hell, don't do that." Because it seemed like the language ingrained in my psyche was being disassembled before my eyes.

But, reading a single article and evaluating it as a standalone entity is a bad thing. I'm slowly realizing that the language is being improved, and that to do that you need to touch a good sized chunk of the syntax and semantics of the language itself. While I'm not giddy at the thought of having to relearn the language, I'm now resigned to the fact that changes are coming. I'll be relearning SOMETHING, so we might as well change everything that needs fixing now.

Just like ripping off a band-aid.

Re:Oh shee-it

m2 on 2002-06-05T23:05:39

I haven't read A5 yet as perl.com has just taken a nice nosedive.

Remove the www from the URL. http://perl.com/ works for me, http://www.perl.com/ doesn't. HTH.

Re:Oh shee-it

jaffray on 2002-06-05T18:34:36

Many people perceive anything new as "more complicated", and have the reaction "oh, gawd, what a pain, I'll have to learn all this new junk, Larry and Damian are horrible people who want to make my life miserable for their own sadistic enjoyment". You've seen this already with the reactions to A1-A4.

Not much to be done, but a page or so of examples of the type "look how little most of your simple cases are going to change, .* isn't going anywhere, take a deep breath, everything's going to be fine, there there" might help appease them. Then plunge into all the stuff that's changing and why it's so much better the new way.

Re:Oh shee-it

felixgallo on 2002-06-05T19:53:40

When you say 'optimized', I don't think you mean what you think you mean. It's syntactically optimized, which is good for some things, but time will tell how much slower the bells and whistles make the engine. Time will also tell how many years we will continue to wait.

This is either the best thing to happen to the regular expression engine since the invention of *, or the most profound and breathtaking example of Second System Effect since the invention of J2EE (or COM+ if you swing that way).

Re:Oh shee-it

marklark on 2002-06-07T15:58:39

But, "syntactically optimized" might be just the thing! I'm not going to get much faster or smarter in the future, but I'll almost certainly be using a faster computer with more memory.

So, if Perl 6 lets me be faster it should work out for the best.

Of course, I might behave like the majority of projects out there and just add more variables, more data, etc (think of the US National Weather Service), and end up still having a slow process.

:^)

Re:Oh shee-it

LunaticLeo on 2002-06-06T01:51:24

You really think so? I can't think of any significant way in which Larry's proposal isn't a vast improvement on what we have now: more readable, more consistent, better optimized for the common cases, more powerful. What do you think they'll object to?

Again, I like it. In my mind, it is a whole New Thing(tm). It is lex-yacc replacement built into Perl, and intimately tied to Perl. Regex's have always been a language with a language. Clearly, Perl5 regexs were going in the grammer direction. However, I can't think of a lot of examples of using the (?...) constructs. And the examples I can think of were simple non-grammer non-rule based stuff. I've used it only to match strings "" but skiping \" tokens sorta naively.

I think Larry Wall dramatically accepted that the regex inteligensia were trying to fit grammer-like constructs into Perl5 regexs. So he said this is a Good Thing(tm) but he wanted it Done Right(tm). But he had to sacrafice backward compatability and compatability/simmilarity with other regex engines. Character classes come to mind, so does having to escape ':', and whitespace is different (today's \s really should be \h in Perl6-A5 regexs).

Other projects have been struggling to create Perl5 regexs as the needed extension to old-school unix extended regexs. Now that Perl6-A5 regexs are a whole new thing tied in with the perl language (ala /@foo := (<rule>)+/ err some such construct). For Java or Python to have Perl6 regexes seems improbable mabe even impossible.

What I would like to see possible is to consider both Perl6 and Perl5 regexs first class citizens in Perl6. That means that we don't need to use P5::Regex; ($foo) = $str =~ /([[:alpha:]]+)/; no P5::Regex;. Instead we can have, like Larry indicated, ($foo) = $str =~ s:p5 /([[:alpha:]]+)/. That should shut up alot of the humbuggers.

BTW. Thanks for you concern with my longevity. Also, I hope you are right that 18 months should be enough for some solid betas.

bare hash

mad-p on 2002-06-05T11:17:38

One rule I don't understand is Rules 8 and 9 on bare hash (p.15 on perl.com).
What is the last assertion for?

Slashdotted?

tzz on 2002-06-05T14:33:40

I can't retrieve any of the A5 anymore, I think the page was Slashdotted.

I was in the middle, too (page 12, I should have done page=all).

www.perl.com overloaded?

bart on 2002-06-05T14:37:21

Is it just me, or is <www.perl.com> acting strangely? Even the home page occasionally gives me a 400 error ("Bad Request"). If so, surely, apo 5 would have to be the cause.

Re:www.perl.com overloaded?

miko on 2002-06-06T00:22:24

That was probably the Slashdotting it got this afternoon. It seems fine now.

/ @kids := [(\S+) \s+]* /

wickline on 2002-06-05T15:13:00


Does anyone see this as potentially confusing given...

apo4 wrote:
> An ordinary flattening list assignment:
>
> @a = (@b, @c);
>
> is equivalent to:
>
> *@a := (@b, @c);
>
> That's not the same as
>
> @a := *(@b, @c);

I mean, we've got := meaning something outside of regexen and something else (it means =) inside regexen.

Are there ever going to be cases where one might want to use := (in the apo4 non-regex sense) inside of a regex?

If so, how would it be typed? As ::= perhaps? Is that OK?

-matt

Re:/ @kids := [(\S+) \s+]* /

Damian on 2002-06-06T07:39:56

I mean, we've got := meaning something outside of regexen and something else (it means =) inside regexen.

No, it doen't. It means "bind", exactly as it does outside a regex. And what it binds in a regex is a hypothetical variable to a captured value.

Re:/ @kids := [(\S+) \s+]* /

wickline on 2002-06-06T15:22:34


> It means "bind", exactly as it does outside a regex

Ahh... Ok.

Well are there times when you might want to mean 'assign'
(as in '=', rather than bind as in ':=') within a regex?

I confess that I haven't really wrapped my mind fully
around the differences between binding and assignment,
so my question may be silly.

-matt

inline comments have no syntax

wickline on 2002-06-05T15:44:54

> Old New
> --- ---
:
>/patpat(?#text)/ /pat pat /

Would there be something wrong with instead allowing

                          either /pat pat /
                                  or /pat pat /
                                  or /pat pat /

or something that *looks* like more a comment instead of looking like a single-quoted string?

I understand that the single-quoted string asserts as true, but in some cases, single-quoted strings asser to false. I can imagine that programmers who never use string assertions could learn from example that the above is "how to do an inline comment" rather than learning that the above is "how to assert a string". If they learn this, then they're liable to get bit one day by or something like that.

As long as we're trying to dump special cases, why not allow a syntax for inline comments? If no special syntax is to be provided, then I'd shy away from saying "here's a kludgey way to do it" because that will just encourage folks to abuse the syntax until they learn the hard way what the sytax *really* means.

-matt

Re:inline comments have no syntax

autarch on 2002-06-05T15:56:43

As Larry says in the 'poco', /x is the default, meaning you can have inline comments like this:

    / x is the default # so this is a comment
        and this is not a comment /

At least that's what I remember from yesterday. I wish perl.com was up so I could confirm that ;)

Re:inline comments have no syntax

wickline on 2002-06-05T16:39:25

perl.com is back up (at least for me)

apo5 explicitly (in text, and by examples such as the one quoted in my post above) says that to get an inline (meaning, I don't want to break my regex into two lines) comment you should assert a string.

So, inline comments don't currently have a syntax, and instead, there is explicit advice to assert a string. I'm wondering whether we can either come up with an appropriate syntax, or not explicitly advise folks to use a kludge.

It just feels like there's something amiss if the design specification is suggesting a kludge. Kludges should be for after the design has been implmented and found to be lacking, not an integral part of the design :)

-matt

PS: no slight intended... I know if I were trying to do even a small part of this, I'd just plain do it Wrong rather than do it so well. This is just a nit pick, really.

Re:inline comments have no syntax

autarch on 2002-06-05T16:57:31

Well, I checked again and you're wrong.

Go to page 6 and look at the table at the bottom.

The very first example shows an inline comment.

Re:inline comments have no syntax

swiftone on 2002-06-05T20:33:03

> Well, I checked again and you're wrong.

Given that he defined "inline comment" to mean "I don't want to break my regex into two lines", he's very much correct. And since that's how Larry refers to inline comments, I'm inclined to agree with him on that definition.

In fact, I'll agree with him across the line: A5 is good, I couldn't do better, this is a nit pick, telling people to use a string assertion as an inline comment is a time-bomb.

Personally I don't much care if we have inline comments (even though I've avoided /x so far), but I agree that telling people to abuse a construction that will fail under certain circumstances is a mistake.

Re:inline comments have no syntax

autarch on 2002-06-05T20:49:21

Oops, I didn't read wickline's message carefully enough. I missed the part about not wanting to break the regex into two lines.

Geez, what a weird nitpick though. I agree that telling people to use inline strings is potentially a problem, but why would you insist on single-line regexes?

Do you try to keep all of you other code on one line too? ;)

Re:inline comments have no syntax

wickline on 2002-06-06T02:33:52


> why would you insist on single-line regexes

Personally, I have zero attachment to in-line comments
in regexen. If your regex is short enough to fit on
one line with an inline comment, then it can darn well
fit on one line with a comment at the end of the line.

So, if there is no support for them, I don't mind.

However, perl5 supports them, and rather than saying
that they go away in perl6, the apo currently documents
(and recomends?) the practice of asserting a string.

*That* is my real nit pick. I think that either the final
spec should say "don't inline comments" or it should say
here's the provided syntax for inlining comments (which
would not be the same as asserting a string).

Having thought about why I've never used in-line regex
comments, I think my vote would be to just not support
them and not advertize any kludge for them either. Tell
folks to put their comment on the end of the line, after
the regexg rather than within it.

Does anyone currently use inline comments rather than the
/x modifier? If so, can you speak to whether and when
inline comments are handy?

-matt

Re:inline comments have no syntax

pdcawley on 2002-06-06T09:06:43

Personally I think asserting a string's a great idea. If it weren't for the fact that it throws a warning at the moment I reckon that 'string in a void' context has potential to be used as a comment elsewhere in perl. But that might be seen as stealing a little too much from smalltalk.

Re:inline comments have no syntax

TimToady on 2002-06-06T18:52:07

The only reason it's in there is so that the p52p6 translator will know what it's supposed to translate (?#...) to. Yes, it'd be possible to translate it to a line-ending comment, but it's better if the translator avoids changing line numbers whenever possible.

Re:inline comments have no syntax

wickline on 2002-06-06T20:14:38


> so that the p52p6 translator will know what
> it's supposed to translate (?#...) to.

Then the translator shoudl probably have a special
check for inlined empty or '0' (zero) comments. I
don't use inline comments myself, but I can imagine
that someone who does use them might start to type
an inline comment and then forget to fill it in if
they got a phone call, or sudden inspiration in a
more troubling bit of code somewhere else.

I have more trouble imagining a (?#0) comment, but
it seems like a good idea to protect against that too.

-matt

Re:inline comments have no syntax

wickline on 2002-06-05T16:46:37

hmmm...

I should have used preview. I need to start posting as Code instead of Plain Old Text

None of the three suggestions above look like comments.

They should read as

either /patpat<(#text#)>/
    or /patpat<#text#>/
    or /patpat<#text>/

-matt

this long line added to get around slashcode avg 8.9 char per line requirement blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah

Regexp objects

Matts on 2002-06-05T16:12:25

There seems to be no indication of how regexps will behave as first class objects. I'm hoping for something akin to either Ruby or Python's Regexp objects. I think it's as good a time as any to kill regexps built using a custom quote-like operator.

Re:Regexp objects

cmeyer on 2002-06-05T19:03:26

Larry drops a hint about regex objects on page 8:

> ...
> To reset it, use the .reset method on the regex object. (If you haven't named the regex > object, too bad...)

Help? (newbieish floundering follows)

mcc on 2002-06-05T17:39:58

I'm sorry, but.. maybe i'm just stupid, but i can't follow this at *ALL*. I mean, i realize the apocalypses are written this way because they're notes on language design, not a language specification, but as a curious bystander i'm having a lot of trouble following the semirandom ordering of this particular apocalypse. It isn't helping that i can never tell whether something in this document that i don't recognize is meant to be a new, proposed operator, or an "established" perl6 operator from one of the other apocalypses that i've just forgotten :)

Could i request that someone post, somewhere, a somewhat more organized approach to this material? I mean, we *could* just all wait for the Exegesis documents, but those seem to mostly be sample code.

What i'm looking for is something somewhat more along the lines of a reference-- i mean, something that just lists each operator/modifier in the new, proposed regex system, and its function, the way the perl 5 reference does. Do you think anyone would be interested in undertaking such a project at the moment, or should i just wait for perl6 to actually be released? ;)

If not, does anyone think they could just kind of post a summary of the new operators? Earlier posts by mr. Wall seem to be implying that in perl 6 syntax, () means grouping, <> refers to a special token flagging something to the regex system, and {} refers to embedded code. Is this correct? What does this mean in practice to the programmer?

Basically, anything that anyone could post helping to digest all this would be greatly appreciated. Most of the RFCs & responses describe an idea very nicely but don't really make clear how this idea would be used. For example, when you're in embedded code ( Is this {}, (<>), or either? ) What scope you're working in? What variables does the regex system set for you in this scope, and are there variables that if you change them it has an effect on stuff? Is the idea just that unless your embedded code returns true, the regex will fail? Stuff like that.

I know, i really should just wait for the exegesis, i'm just so curious :)

Sorry to bother :)

Eek!

mcc on 2002-06-05T20:38:23

Wow, i feel dumb. I had completely missed pages 2-16, because i didn't realize they existed-- i was looking at the links on the front page covering the RFCs (pages 17 on) and assumign that was a table of contents. i missed the "jump to page" bit at the bottom.

The first sixteen pages actually make sense, and are MUCH better organized. Please feel free to disregard my parent post :)

Use of //

swiftone on 2002-06-05T20:39:37

Okay, so Page 9 says that // is no longer a m//, but instead a short form of rx//.

It also talks about delayed evaluation of the pattern.

So, given that almost all the example patterns use /foo/ and not m/foo/, what is happening, and when is it happening?

Re:Use of //

Damian on 2002-06-06T08:04:44

So, given that almost all the example patterns use /foo/ and not m/foo/, what is
happening, and when is it happening?


/.../ now *always* returns a regex object.
But a regex object in a void, boolean, string, numeric, or =~ context
immediately matches against the current topic.

So you get:
    /pattern/;                 # void context    -> automatch
    if /pattern/ {...}         # boolean context -> automatch
    %hash{/pattern/} = 1;      # string context  -> automatch
    if /pattern/ < 1 {...}     # numeric context -> automatch
    $str =~ /pattern/;         # =~ context      -> automatch
    when /pattern/ {...}       # =~ context      -> automatch
and:
    @pats = (/pat1/, /pat2/);  # list context    -> no automatch
    %hash = ( pat => /pat/ );  # scalar context  -> no automatch
    $pat = /pat/;              # scalar context  -> no automatch
    given /pat/ {...}          # scalar context  -> no automatch
And if you're ever unsure, just be explicit:
    $pat = rx/pat/;            # definitely assigning a regex object
    $pat = m /pat/;            # definitely assigning the match result

Minor bug?

Cine on 2002-06-05T20:51:33

Shouldn't
/$1/ my $old1 = $1; /$old1/ # must use temporary here
have been
/$1/ my $old1 = $1; // # must use temporary here
???

Re:Minor bug?

Cine on 2002-06-05T20:53:39

ARG, it ate my < > even in plain text...
/$1/ my $old1 = $1; /<$old1>/ # must use temporary here

Re:Minor bug?

TimToady on 2002-06-06T18:43:04

In general, when matching against backreferences, you want to match it literally, not as if it were regex. So you wouldn't want the angles.

Re:Minor bug?

Cine on 2002-06-05T21:11:57

And in the perl5 compat section:
/pat\s*pat/ /:w pat pat/ # match word sequence
Should have been
/pat\s+pat/ /:w[]pat pat/ # match word sequence
or what?
and thus
/pat\s*pat/ should have been converted to /pat\s*pat/

Can't wait

gbarr on 2002-06-05T21:09:45

I don't know about expectations, it met most of mine. But then I was expecting large changes.

I think Larry has done and excellent job and if he was to blame for the old RE syntax, he has certainly gone a long way to make up for his sins.

While reading this I could not help myself from thinking how to rewrite Convert::ASN1 to use it for the parser. Heck, I could probably use it for the engine too.

Re:Can't wait

pdcawley on 2002-06-06T09:11:37

I have the feeling that a2p6 is going to be a doddle to write. Tweak the grammar so that the awk source generates a humungous regex, wrap a 'while () {...}' around it and away you go.

is it \Q or \q (or will either work?)

wickline on 2002-06-05T21:52:05


the backslash sequences table used both \q and \Q
to quotemeta and I couldn't tell from the rest of the apo whether their interchangable or whether it was just a typo.

-matt

suggestion for the perl6 camel and pod

wickline on 2002-06-05T22:03:02


> If you want matching brackets for the delimiters
> I'd suggest that you use square brackets, since
> they now mean grouping without capturing.

Tips like the following would probably be good to
gather up for the perl6 perlstyle pod, and for use
in the rest of the pod and the first perl6 camel.

'twould be a fine thing if all the docs followed
recomended practice. I think there may still be some
examples of $a and $b in the perl5 pod which don't
have any association with sort().

I haven't closely followed the perl6 lists, but I
recall initially that there was discussion of RFC
guidelines, and that WRT parot there has been thought
given beforehand to PDD guidelines, and to coding style,
so my suggestion is probably quite redundant.

Just in case though, I thought I'd throw it out: draft
perlstyle first, then have the rest of the pod follow
that :)

-matt

I'd just like to say three things...

koschei on 2002-06-05T22:14:22

I'd just like to say three things:

  1. This is a wonderful Apocalypse. Probably the best so far. Comprehensible, clearly makes things simpler, yet more powerful. And guaranteed to rile some people. Probably quite a few.
  2. Kudos to Damian and Larry --- reading the Apocalypse, it's hard to see where one man's ideas begin and another's end. An elegant synthesis.
  3. I'm just happy I'm not Mr Friedl, particularly since the new edition is coming out quite soon. [I was going to ask when a new edition was coming out but found that when getting the url for #1. I like how its Perl section is 5.8. If anyone knows: does it still use the gorgeous, intricate, custom typography?]

Re:I'd just like to say three things...

gnat on 2002-06-10T05:33:13

Jeff's got nothing to worry about. I don't see an implementation of perl6 regular expressions arriving any time soon. Bits and pieces, perhaps, but I imagine that it'll be done roughly at the time that perl6.0.0 is released. Naturally, I'd be delighted to be proven wrong!

And yes, lots of gorgeous intricate custom typography. So much, in fact, that we can't convert it to DocBook and put it on Safari. Lots of new typography, too, if my eyes don't deceive. It looks even more like the book has been decorated by spiders walking through ink :-)

--Nat

Re:I'd just like to say three things...

koschei on 2002-06-10T19:49:16

Mmm. Can't wait until next month (particularly if I get a job). Even if I did just buy 1st ed 6-7 months ago. Expansions look good.

I'll probably sign up to Safari one of these days. Seems like a very good deal, even if it's not all local or on paper.

typo in :u0 :u1 :u2 :u3 table?

wickline on 2002-06-05T22:15:53


:u1 .. :u3 are all defined to indicate Level 1 support.
Should that instead read as level 1 .. 3

-matt

Re:typo in :u0 :u1 :u2 :u3 table?

TimToady on 2002-06-06T18:45:08

Yes, the final set of typo fixes didn't actually make it in, and that was one of them.

typo, or my misunderstanding of p6 context types?

wickline on 2002-06-05T22:22:22


We have:

> A regex is executed automatically if it's
> in a boolean, numeric, or string context.

and we also have:

> @all = m:any /a.*?a/;

Is that assignment in string context somehow (array of strings?), or should list context be added to the list of conditions under which a regex is executed automatically?

-matt

Re:typo, or my misunderstanding of p6 context type

wickline on 2002-06-05T22:37:29

whoops!

never mind :)

I've been skimming ahead whenever I find a question
before posting it, but in this case I failed to skim
well enough.

My question is explicitly addressed in the apo :)

-matt

do we need an rx for perl6?

wickline on 2002-06-05T22:33:08



I grok dumping qr() because a regex is no longer a
glorified string. I also grok being able to say

    my $r = rule {...};

just as one would say

    my $s = sub {...};

What I don't grok is why qr() was replaced with rx.
Why do we need rx? Isn't rule sufficient? If we do
need rx, then why don't we need sx for sub?

I don't think that 'rx' is appreciably easier to type
than 'rule' (at least on a qwerty keyboard) because
'rule' uses more friendly reaches, and while it does
have more characters, they alternate between hands.

My hunch is that rx was motivated by "well, we had qr
and it doesn't apply becuase it's not a string in the
land of perl6". If perl6 were being designed before the
qr had been introduced into perl5, would rx have been
invented, or would rule have been seen as sufficient?

Do we need an rx for perl6?

-matt

Re:do we need an rx for perl6?

Damian on 2002-06-06T08:14:43

rx lets you choose your delimiters; rule doesn't.

rule lets you name your regexes; rx doesn't.

Re:do we need an rx for perl6?

jdporter on 2002-06-11T16:32:48

What if I want to do both?
Why isn't there just one operator that lets you do both?

the topic is the subject of this post

wickline on 2002-06-05T23:17:41

> Note that $_ within the closure refers to
> this state object, not the original search
> string. If you search on the state object,
> however, it pretends that you wanted to
> continue the search on the original string.

What if I wanted to do something else to $_ ?

Hmm... More precisely, I guess I'm wondering about two
separate things here.

1) What if I wanted to do something to what
   had been $_ before we were within the closure?

2) What if I wanted to do something to the
   current search string?

Regarding #1, I can see that it just might not be
possible. If you intend to be accessing the old $_
within the regex, then you need to be sure to assign
that $_ to something else before entering the regex.

Alternatively, if it's not too ugly, perhaps the $_
state object could have an _ method which returns
a reference to what $_ had been before the regex?

    m:i/^\#{._.comment()}$/ for @stringifiable_objects;

Regarding #2, the above example just happens to have
$_ as the current search string. However, in some
cases, $_ will be busy doing other things, and the
current search string will be in some other variable.
In the case of a for my $x (@a) {...} loop, we know
that the string is $x, so our regex could be told to
tweak $x directly.

However, we won't always be building our regexen
right near where they get used. In fact, they could
be quite far away. The regex could be used in several
places and need to tweak the current search string in
different bits of code where that search string has
different names in different instances.

Can the regex state object have a method that returns
a reference to the current search string?

    .string
    .current
    .target
    .cur
    .str

or something like that? In many cases, this would be
synonymous with ._ (or whatever it would be called),
but in cases where the current search string is not
$_ (from before entering the regex), then the two
would be different.

-matt

Re:the topic is the subject of this post

TimToady on 2002-06-06T18:38:08

By the time you get into a regex, you're always
dealing with $_.  Even =~ behaves as a topicalizer
for its right side.  So, while the $_ inside a
closure is the search state object, it's always
related intimately to the outer $_, which is always
an alias for whatever you're currently searching.
Any string operations on the inner $_ should be
delegated to the original string.  Providing an
explicit method to get at that string should be a
no-brainer, though if you mess with the string,
there's no guarantee that the regex will continue
to make sense.

Re:the topic is the subject of this post

wickline on 2002-06-07T10:31:40

> though if you mess with the string,
> there's no guarantee that the regex
> will continue to make sense.

I kinda figured that would be the case. I was thinking
of methods on strings or stringifiable objects which
don't change the string. A trivial example might be
getting the length of the target string, when the regex
may not know the name of the variable containing the
string to which is being applied.

It sounds like it won't need to know, which sounds good!

-matt

additional access to subrule state

wickline on 2002-06-05T23:30:08

> So I think the return object behaves either like a
> hash or an array as appropriate. (Note that such an
> array might be declared to have an origin at 1
> rather than 0.)

How about if index 0 had the state object of that
subrule? That way you could dig into the subrule's
guts just as deeply as you can into the current rule's
guts?

The list-ification of the array could be special-cased
to return indices from 1 on up.

...alternatively, the subrule's state could just be a
method on the magical array/hash subrule return value.

-matt

yummy!

wickline on 2002-06-05T23:40:14

mkay, just finished reading...

My questions/comments above do not reflect my overall
opinion. They're really just things I didn't get and a
nit pick or two.

Overall, although my brain is dripping a bit from one
ear, I quite like perl6-flavoured regexen :)

I think I need to watch some mind-numbing TV though...
my head hurts a wee bit :)

-matt

negating character classes

OddHack on 2002-06-06T19:31:14

I noticed that on A5p12, Larry gives this equivalence:

[^[:alpha:]]   <-alpha>

I expected <!alpha> -- is there a difference between <-alpha> and <!alpha>?

Re:negating character classes

wickline on 2002-06-06T23:17:54

> difference between <-alpha> and <!alpha>?

I think that <!alpha> can match nothing,
while <-alpha> can match only a non-alpha
character.

So the first can match a zero-length substring
while the second must match a one-character
substring.

-matt

Re:negating character classes

wickline on 2002-06-07T10:28:31

... but I'm really not entirely sure :)

-matt