The Apocalypses are Larry's explanation of the design of Perl 6. Apocalypse 5 deals with - or should I say, redesigns - regular expressions. Hang onto your hats, because this one is 24 pages long and will break a lot of your expectations...
KEKEK I HAXOR YR APOCLIPS!
TorgoX on 2002-06-05T03:03:58
24 pages? It's only that long because the "pages" on perl.com are so very much shorter than they've ever been before.I do wish they had a "printable version" link that'd dump the whole thing as simple HTML all on a single page. But http://www.perl.com/pub/a/2002/06/04/apo5.html?page=all works.
Re:KEKEK I HAXOR YR APOCLIPS!
TorgoX on 2002-06-05T04:05:42
It seems there is a "printable version" link, but it's rendered as a wee little icon that's off the side of my screen. HTML is hard!Re:On breakage of various types
Simon on 2002-06-05T04:31:46
But the implication that Apocalypse 5 is unprecedentedly huge is just plain wrong: in line count A5 is within a few percent of A4.
% wc -l a4.pod a5.pod
2175 a4.pod
2811 a5.podIn word count, about 30% more. In byte code, about 40% more. And A4 wasn't small. Still, it depends on your definition of "a few"...
Re:On breakage of various types
Damian on 2002-06-05T05:51:53
In my view, line counts on marked-up source aren't particularly meaningful. I was reckoning by the counts on the visible text itself:
% lynx -dump 'http://www.perl.com/pub/a/2002/06/04/apo5.html?page=all' > a5.txt
% lynx -dump 'http://www.perl.com/pub/a/2002/01/15/apo4.html?page=all' > a4.txt
% wc a[45].txt
2518 16682 104463 a4.txt
2768 18534 119403 a5.txt
So in terms of readable content, A5 is 10% bigger by lines, 11% bigger by words, and 14% bigger by bytes. No big deal.Re:On breakage of various types
jdavidb on 2002-06-05T12:56:37
Where can I get a5 in pod?
Apocalypses in pod
pne on 2002-06-10T11:39:47
It probably helps if you're an oreilly.com editor:) Re:On breakage of various types
m2 on 2002-06-05T21:52:08
24 pages? It's only that long because the "pages" on perl.com are so very much shorter than they've ever been before.Yes, that's completly out of touch with reality. The one page, printable version is 33 pages long on A4 paper...
Sure, all the changes will take time to get used to. But the result will be both cleaner and more powerful than what we have now.
-sam
Re:First Reaction
tzz on 2002-06-05T14:41:26
Dead? I doubt it. Parse::RecDescent could benefit from the new Perl 6 regex engine (both in internal implementation and in features passed to the user), but for writing grammars it is very different from what Perl 6 will offer. At least, that's my hope. Damian may have already started work on the:p-rd directive
:)
Re:First Reaction
Damian on 2002-06-06T07:33:44
Actually, having seen A5, you've seen the first draft of the new Parse::FastDescent syntax! Of course, P::FD will provide a large number of additional assertions (replicating P::RD's numerous handy directives), but the syntax and semantics will almost certainly be as close to A5 as possible.
That way, P::FD can continue P::RD's role as a test-bed for Perl 6 regex/grammar features, as well as providing a migration path from Perl 5 to Perl 6.Re:First Reaction
tzz on 2002-06-06T13:37:49
I certainly hope backward compatibility is not completely lost, or at least we're given a way to port P::RD grammars to P::FD semi-automatically.Re:First Reaction
Damian on 2002-06-06T21:28:06
I certainly hope backward compatibility is not completely lost...
I'm afraid so.
...or at least we're given a way to port P::RD grammars to P::FD semi-automatically.
Yes. One of the big tests of P::FD is whether I can build a P::RD metagrammar with it.
Re:Oh shee-it
Damian on 2002-06-05T06:02:43
I like A5, but it is dramatic. It will piss off alot of hum-buggers.
You really think so? I can't think of any significant way in which Larry's proposal isn't a vast improvement on what we have now: more readable, more consistent, better optimized for the common cases, more powerful. What do you think they'll object to? (That's not a rhetorical question: I'd really like to know.)
Mostly, it makes me wonder how long it will take to build perl6. Much of my apprehension is allieviated by the thought that perl6 will not show up in my lifetime:)
Wow, I sincerely hope you're wrong about that! I'm expecting to see a usable beta some time next year, and I'd hate to think of you shuffling off this mortal coil so soon!;-) Re:Oh shee-it
pudge on 2002-06-05T11:17:00
Well, if I were inclined to object, I would object about the fact that it is too different. I don't, in general, like different.
OK, that's an oversimplification, but I just woke up. In any event, though, I am far more interested in the Perl syntax than the Perl regex syntax. My primary concern with regexes is the learning curve and speed of execution. It's a very dissimilar situation to Perl syntax itself, to my mind: I mean, how many people are really in love with Perl regex syntax, really?
Re:Oh shee-it
Smylers on 2002-06-05T13:52:58
What do you think they'll object to?Backwards compatibility with Perl 5 people can get used to, as they use less Perl 5. Incompatibility with
egrep
,vi
,mod_rewrite
, et al could be harder to overcome. They aren't all completely compatible at the moment, but at least things like character classes are similar. I can see people not liking having to learn a completely different regexp syntax just for Perl.It also depends on how many other languages pick up Perl 6 regexes. It'd be bizarre to find Java and C++ coders disliking Perl because its regexes aren't what they're used to.
(Not me though — I think the new regexes are a great improvement.)
Smylers
Re:Oh shee-it
pne on 2002-06-06T08:35:51
Yeah... character classes were also number one on my list of "things that I'll miss knowing how to do in Perl6".
I know they'll still be there, but the syntax will be different -- and pretty much all other regex languages will still allow [aeiou], only Perl6 will require <[aeiou]> (or perhaps <vowel>?). So I won't be able to carry over my knowledge directly.Re:Oh shee-it
buckaduck on 2002-06-06T14:14:47
I actually think thatmight be pretty cool. But how would you handle "sometimes Y"?<vowel>I suppose if any language could do it, Perl could...
Re:Oh shee-it
ct on 2002-06-05T15:36:05
I haven't read A5 yet as perl.com has just taken a nice nosedive.My reaction to the first four (though I usually need the Exegesis to make sense out of some bits
:) has been, well, not negative so much as fearful. It may be perl, but it's clear that it's radically different. While I don't have a huge stack of legacy code I'm concerned about, I've done perl long enough that when I think of a problem, I think of it in perl. Changing the syntax of the language on me now would be like someone announcing "OK, new rules for the English Language, instead of adverbs ending in -ly, they now end in -o-rama. Quick-o-rama!".
So for the first few Apoc's I kept thinking "Oh, hell, don't do that." Because it seemed like the language ingrained in my psyche was being disassembled before my eyes.
But, reading a single article and evaluating it as a standalone entity is a bad thing. I'm slowly realizing that the language is being improved, and that to do that you need to touch a good sized chunk of the syntax and semantics of the language itself. While I'm not giddy at the thought of having to relearn the language, I'm now resigned to the fact that changes are coming. I'll be relearning SOMETHING, so we might as well change everything that needs fixing now.
Just like ripping off a band-aid.
Re:Oh shee-it
m2 on 2002-06-05T23:05:39
I haven't read A5 yet as perl.com has just taken a nice nosedive.Remove the www from the URL. http://perl.com/ works for me, http://www.perl.com/ doesn't. HTH.
Re:Oh shee-it
jaffray on 2002-06-05T18:34:36
Many people perceive anything new as "more complicated", and have the reaction "oh, gawd, what a pain, I'll have to learn all this new junk, Larry and Damian are horrible people who want to make my life miserable for their own sadistic enjoyment". You've seen this already with the reactions to A1-A4.Not much to be done, but a page or so of examples of the type "look how little most of your simple cases are going to change,
.* isn't going anywhere, take a deep breath, everything's going to be fine, there there" might help appease them. Then plunge into all the stuff that's changing and why it's so much better the new way. Re:Oh shee-it
felixgallo on 2002-06-05T19:53:40
When you say 'optimized', I don't think you mean what you think you mean. It's syntactically optimized, which is good for some things, but time will tell how much slower the bells and whistles make the engine. Time will also tell how many years we will continue to wait.
This is either the best thing to happen to the regular expression engine since the invention of *, or the most profound and breathtaking example of Second System Effect since the invention of J2EE (or COM+ if you swing that way).Re:Oh shee-it
marklark on 2002-06-07T15:58:39
But, "syntactically optimized" might be just the thing! I'm not going to get much faster or smarter in the future, but I'll almost certainly be using a faster computer with more memory.
So, if Perl 6 lets me be faster it should work out for the best.
Of course, I might behave like the majority of projects out there and just add more variables, more data, etc (think of the US National Weather Service), and end up still having a slow process.
:^)Re:Oh shee-it
LunaticLeo on 2002-06-06T01:51:24
You really think so? I can't think of any significant way in which Larry's proposal isn't a vast improvement on what we have now: more readable, more consistent, better optimized for the common cases, more powerful. What do you think they'll object to?Again, I like it. In my mind, it is a whole New Thing(tm). It is lex-yacc replacement built into Perl, and intimately tied to Perl. Regex's have always been a language with a language. Clearly, Perl5 regexs were going in the grammer direction. However, I can't think of a lot of examples of using the (?...) constructs. And the examples I can think of were simple non-grammer non-rule based stuff. I've used it only to match strings "" but skiping \" tokens sorta naively.
I think Larry Wall dramatically accepted that the regex inteligensia were trying to fit grammer-like constructs into Perl5 regexs. So he said this is a Good Thing(tm) but he wanted it Done Right(tm). But he had to sacrafice backward compatability and compatability/simmilarity with other regex engines. Character classes come to mind, so does having to escape ':', and whitespace is different (today's \s really should be \h in Perl6-A5 regexs).
Other projects have been struggling to create Perl5 regexs as the needed extension to old-school unix extended regexs. Now that Perl6-A5 regexs are a whole new thing tied in with the perl language (ala
/@foo
err some such construct). For Java or Python to have Perl6 regexes seems improbable mabe even impossible.:= (<rule>)+/ What I would like to see possible is to consider both Perl6 and Perl5 regexs first class citizens in Perl6. That means that we don't need to
use P5::Regex; ($foo) = $str =~
. Instead we can have, like Larry indicated,/([[:alpha:]]+)/; no P5::Regex; ($foo) = $str =~ s:p5
. That should shut up alot of the humbuggers./([[:alpha:]]+)/ BTW. Thanks for you concern with my longevity. Also, I hope you are right that 18 months should be enough for some solid betas.
Re:www.perl.com overloaded?
miko on 2002-06-06T00:22:24
That was probably the Slashdotting it got this afternoon. It seems fine now.
Re:/ @kids := [(\S+) \s+]* /
Damian on 2002-06-06T07:39:56
I mean, we've got:= meaning something outside of regexen and something else (it means =) inside regexen.
No, it doen't. It means "bind", exactly as it does outside a regex. And what it binds in a regex is a hypothetical variable to a captured value.Re:/ @kids := [(\S+) \s+]* /
wickline on 2002-06-06T15:22:34
> It means "bind", exactly as it does outside a regex
Ahh... Ok.
Well are there times when you might want to mean 'assign'
(as in '=', rather than bind as in ':=') within a regex?
I confess that I haven't really wrapped my mind fully
around the differences between binding and assignment,
so my question may be silly.
-matt
Re:inline comments have no syntax
autarch on 2002-06-05T15:56:43
As Larry says in the 'poco',/x is the default, meaning you can have inline comments like this:
/ x is the default # so this is a comment
and this is not a comment/
At least that's what I remember from yesterday. I wish perl.com was up so I could confirm that;) Re:inline comments have no syntax
wickline on 2002-06-05T16:39:25
perl.com is back up (at least for me)
apo5 explicitly (in text, and by examples such as the one quoted in my post above) says that to get an inline (meaning, I don't want to break my regex into two lines) comment you should assert a string.
So, inline comments don't currently have a syntax, and instead, there is explicit advice to assert a string. I'm wondering whether we can either come up with an appropriate syntax, or not explicitly advise folks to use a kludge.
It just feels like there's something amiss if the design specification is suggesting a kludge. Kludges should be for after the design has been implmented and found to be lacking, not an integral part of the design:)
-matt
PS: no slight intended... I know if I were trying to do even a small part of this, I'd just plain do it Wrong rather than do it so well. This is just a nit pick, really.
Re:inline comments have no syntax
autarch on 2002-06-05T16:57:31
Well, I checked again and you're wrong.
Go to page 6 and look at the table at the bottom.
The very first example shows an inline comment.Re:inline comments have no syntax
swiftone on 2002-06-05T20:33:03
> Well, I checked again and you're wrong.
Given that he defined "inline comment" to mean "I don't want to break my regex into two lines", he's very much correct. And since that's how Larry refers to inline comments, I'm inclined to agree with him on that definition.
In fact, I'll agree with him across the line: A5 is good, I couldn't do better, this is a nit pick, telling people to use a string assertion as an inline comment is a time-bomb.
Personally I don't much care if we have inline comments (even though I've avoided/x so far), but I agree that telling people to abuse a construction that will fail under certain circumstances is a mistake.
Re:inline comments have no syntax
autarch on 2002-06-05T20:49:21
Oops, I didn't read wickline's message carefully enough. I missed the part about not wanting to break the regex into two lines.
Geez, what a weird nitpick though. I agree that telling people to use inline strings is potentially a problem, but why would you insist on single-line regexes?
Do you try to keep all of you other code on one line too?;) Re:inline comments have no syntax
wickline on 2002-06-06T02:33:52
> why would you insist on single-line regexes
Personally, I have zero attachment to in-line comments
in regexen. If your regex is short enough to fit on
one line with an inline comment, then it can darn well
fit on one line with a comment at the end of the line.
So, if there is no support for them, I don't mind.
However, perl5 supports them, and rather than saying
that they go away in perl6, the apo currently documents
(and recomends?) the practice of asserting a string.
*That* is my real nit pick. I think that either the final
spec should say "don't inline comments" or it should say
here's the provided syntax for inlining comments (which
would not be the same as asserting a string).
Having thought about why I've never used in-line regex
comments, I think my vote would be to just not support
them and not advertize any kludge for them either. Tell
folks to put their comment on the end of the line, after
the regexg rather than within it.
Does anyone currently use inline comments rather than the
/x modifier? If so, can you speak to whether and when
inline comments are handy?
-matt
Re:inline comments have no syntax
pdcawley on 2002-06-06T09:06:43
Personally I think asserting a string's a great idea. If it weren't for the fact that it throws a warning at the moment I reckon that 'string in a void' context has potential to be used as a comment elsewhere in perl. But that might be seen as stealing a little too much from smalltalk.
Re:inline comments have no syntax
TimToady on 2002-06-06T18:52:07
The only reason it's in there is so that the p52p6 translator will know what it's supposed to translate (?#...) to. Yes, it'd be possible to translate it to a line-ending comment, but it's better if the translator avoids changing line numbers whenever possible.Re:inline comments have no syntax
wickline on 2002-06-06T20:14:38
> so that the p52p6 translator will know what
> it's supposed to translate (?#...) to.
Then the translator shoudl probably have a special
check for inlined empty or '0' (zero) comments. I
don't use inline comments myself, but I can imagine
that someone who does use them might start to type
an inline comment and then forget to fill it in if
they got a phone call, or sudden inspiration in a
more troubling bit of code somewhere else.
I have more trouble imagining a (?#0) comment, but
it seems like a good idea to protect against that too.
-matt
Re:inline comments have no syntax
wickline on 2002-06-05T16:46:37
hmmm...
I should have used preview. I need to start posting as Code instead of Plain Old Text
None of the three suggestions above look like comments.
They should read as
either/patpat<(#text#)>/
or/patpat<#text#>/
or/patpat<#text>/
-matt
this long line added to get around slashcode avg 8.9 char per line requirement blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
Re:Regexp objects
cmeyer on 2002-06-05T19:03:26
Larry drops a hint about regex objects on page 8:
>...
> To reset it, use the.reset method on the regex object. (If you haven't named the regex > object, too bad...)
Eek!
mcc on 2002-06-05T20:38:23
Wow, i feel dumb. I had completely missed pages 2-16, because i didn't realize they existed-- i was looking at the links on the front page covering the RFCs (pages 17 on) and assumign that was a table of contents. i missed the "jump to page" bit at the bottom.
The first sixteen pages actually make sense, and are MUCH better organized. Please feel free to disregard my parent post:)
It also talks about delayed evaluation of the pattern.
So, given that almost all the example patterns use
Re:Use of //
Damian on 2002-06-06T08:04:44
So, given that almost all the example patterns use/foo/ and not m/foo/, what is
happening, and when is it happening?
/.../ now *always* returns a regex object.
But a regex object in a void, boolean, string, numeric, or =~ context
immediately matches against the current topic.
So you get:and:/pattern/; # void context -> automatch
if/pattern/ {...} # boolean context -> automatch
%hash{/pattern/} = 1; # string context -> automatch
if/pattern/ < 1 {...} # numeric context -> automatch
$str =~/pattern/; # =~ context -> automatch
when/pattern/ {...} # =~ context -> automatch And if you're ever unsure, just be explicit:@pats = (/pat1/,/pat2/); # list context -> no automatch
%hash = ( pat =>/pat/ ); # scalar context -> no automatch
$pat =/pat/; # scalar context -> no automatch
given/pat/ {...} # scalar context -> no automatch $pat = rx/pat/; # definitely assigning a regex object
$pat = m/pat/; # definitely assigning the match result
Re:Minor bug?
Cine on 2002-06-05T20:53:39
ARG, it ate my < > even in plain text.../$1/ my $old1 = $1; /<$old1>/ # must use temporary here Re:Minor bug?
TimToady on 2002-06-06T18:43:04
In general, when matching against backreferences, you want to match it literally, not as if it were regex. So you wouldn't want the angles.Re:Minor bug?
Cine on 2002-06-05T21:11:57
And in the perl5 compat section:
/pat\s*pat//:w pat pat/ # match word sequence
Should have been
/pat\s+pat//:w[]pat pat/ # match word sequence
or what?
and thus
/pat\s*pat/ should have been converted to/pat\s*pat/
Re:Can't wait
pdcawley on 2002-06-06T09:11:37
I have the feeling that a2p6 is going to be a doddle to write. Tweak the grammar so that the awk source generates a humungous regex, wrap a 'while () {...}' around it and away you go.
I'd just like to say three things:
Re:I'd just like to say three things...
gnat on 2002-06-10T05:33:13
Jeff's got nothing to worry about. I don't see an implementation of perl6 regular expressions arriving any time soon. Bits and pieces, perhaps, but I imagine that it'll be done roughly at the time that perl6.0.0 is released. Naturally, I'd be delighted to be proven wrong!And yes, lots of gorgeous intricate custom typography. So much, in fact, that we can't convert it to DocBook and put it on Safari. Lots of new typography, too, if my eyes don't deceive. It looks even more like the book has been decorated by spiders walking through ink
:-) --Nat
Re:I'd just like to say three things...
koschei on 2002-06-10T19:49:16
Mmm. Can't wait until next month (particularly if I get a job). Even if I did just buy 1st ed 6-7 months ago. Expansions look good.
I'll probably sign up to Safari one of these days. Seems like a very good deal, even if it's not all local or on paper.
Re:typo in :u0 :u1 :u2 :u3 table?
TimToady on 2002-06-06T18:45:08
Yes, the final set of typo fixes didn't actually make it in, and that was one of them.
Re:typo, or my misunderstanding of p6 context type
wickline on 2002-06-05T22:37:29
whoops!
never mind:)
I've been skimming ahead whenever I find a question
before posting it, but in this case I failed to skim
well enough.
My question is explicitly addressed in the apo:)
-matt
Re:do we need an rx for perl6?
Damian on 2002-06-06T08:14:43
rx lets you choose your delimiters; rule doesn't.
rule lets you name your regexes; rx doesn't.Re:do we need an rx for perl6?
jdporter on 2002-06-11T16:32:48
What if I want to do both?
Why isn't there just one operator that lets you do both?
Re:the topic is the subject of this post
TimToady on 2002-06-06T18:38:08
By the time you get into a regex, you're always
dealing with $_. Even =~ behaves as a topicalizer
for its right side. So, while the $_ inside a
closure is the search state object, it's always
related intimately to the outer $_, which is always
an alias for whatever you're currently searching.
Any string operations on the inner $_ should be
delegated to the original string. Providing an
explicit method to get at that string should be a
no-brainer, though if you mess with the string,
there's no guarantee that the regex will continue
to make sense.Re:the topic is the subject of this post
wickline on 2002-06-07T10:31:40
> though if you mess with the string,
> there's no guarantee that the regex
> will continue to make sense.
I kinda figured that would be the case. I was thinking
of methods on strings or stringifiable objects which
don't change the string. A trivial example might be
getting the length of the target string, when the regex
may not know the name of the variable containing the
string to which is being applied.
It sounds like it won't need to know, which sounds good!
-matt
[^[:alpha:]] <-alpha>
I expected <!alpha> -- is there a difference between <-alpha> and <!alpha>?
Re:negating character classes
wickline on 2002-06-06T23:17:54
> difference between <-alpha> and <!alpha>?
I think that <!alpha> can match nothing,
while <-alpha> can match only a non-alpha
character.
So the first can match a zero-length substring
while the second must match a one-character
substring.
-matt
Re:negating character classes
wickline on 2002-06-07T10:28:31
... but I'm really not entirely sure:)
-matt