Day 1313: PPI 1.000 released

Alias on 2005-07-10T14:17:25

It is with a huge sigh of relief that I am glad to officially announce the release of PPI 1.000, the Perl document parser.

For me personally, it represents quite possibly the hardest and most complex piece of coding I've ever had to do and many thousand hours of work. Not to mention the number of hours spent banging my head against a wall trying to deal with the evils of Perl's syntax.

But now it's done, and meets all of my original milestones. These include a complete and stable Perl Document Object Model, 100% "Round Trip" safety, a DOM tree struture implemented in a simple, high performance and completely leak proof way, a primary API designed around convenience and Perl's DWIM principles, and a set of elegant and extendable/pluggable APIs for handling Queries, Transformations and Normalisation.

What now?

PPI is going to stay as it is for a while, so that the community as a whole has time to kick the tires, take it for a drive, and start to create new and interesting things I haven't thought of yet.

Specifically, I'm looking at a 3 month shakedown period. During this time we should continue to see point releases, but I won't be adding any major new features in the PPI core.

In terms of performance, I'd like to call for volunteers to help work on PPI::XS, the PPI accelerator.

It's design is well fleshed out now, and we can re-implement some of the more critical parts of PPI safely and incrementally, one function at a time.

Anyone interested in this, or in PPI in general can join the parseperl-discuss mailing list over at the Parse::Perl project page on SourceForge.

For the time being though I expect most of the activity to be by external authors in the PPIx:: and Perl:: namespaces.

Some of these project should start to appear over the next few days or weeks now we have a final release, and include a new release of the Proton CE Perl editor with embedded PPI support, more variation in syntax highlighting modules, Dan Brooks' PPIx::Analyze, Storable-based Document caches, and a number of other packages that will start to take towards to the next major goal of a refactoring Perl editor.

Although it took a new definition of "parse", perl is now no longer the only thing that can parse Perl. So I guess now it's over to you guys. Have fun, and enjoy! :)


A few disclaimers are necessary there...

merlyn on 2005-07-10T14:55:35

100% "Round Trip" safety
... provided you don't change the document in any way, like perhaps "strip comments" or anything that is not the NULL transformation.

If you do any transformation, you will not know that you've misparsed. You'll simply get the wrong result, and you will damage code.

Although it took a new definition of "parse", perl is now no longer the only thing that can parse Perl.
I know you're hinting at it with your definition of "parse" caveat there, but I think it deserves a better underlining.

Your work, as massive an undertaking as it is, cannot be guaranteed to understand Perl in the same way that the perl compiler can. My oft-referenced (even by you) message "On Parsing Perl" shows precise examples that PPI cannot ever get correct.

Yes, you can round trip it to your fake tokens and back to Perl source code, but you cannot determine perl's actual tokenization, nor can you even know that you misparsed it. It'd be like saying that you take the source text and break it into two-character chunks, then take those chunks back to the original text. Whooptidoo.

Yes, your code will be useful to 98% of those who want a static parsing of Perl code, but the problem is that you'll never know if you're in that last 2%, and it's impossible for PPI to tell if it got it wrong.

Re:A few disclaimers are necessary there...

jplindstrom on 2005-07-10T16:12:32

I think the interesting question is whether a 98% solution is useful or useless. I'd say it's possible to do useful things with it. It's not like no-one knows Perl is dynamic.

So let's not depend on PPI to tell if it got it wrong.

If you're doing manual refactoring, do you have tests to back up your transformation? If you don't, you're doing it wrong. Same with using a PPI backed refactoring tool.

Re:A few disclaimers are necessary there...

sheriff_p on 2005-07-11T10:31:16

Randal,

Why are you so overwhelmingly negative about this project? It looks a bit petulant.

+Pete

Re:A few disclaimers are necessary there...

merlyn on 2005-07-11T23:08:59

Why are you so overwhelmingly negative about this project?
I'm only negative to the point that the hype needs to be spin-corrected.

When Adam presents a fair and balanced description of PPI, I've been quiet. But when he presents an offsetting view, I present a counter-offsetting view.

It's about awareness. Not everyone knows why only perl can truly parse Perl, so they see PPI and get big googly eyes, when in fact it's not what it advertises itself to be, until you read all the fine print. When Adam leaves the fine print out, I put it back in.

Re:A few disclaimers are necessary there...

Alias on 2005-07-14T02:01:50

Well, except that perl can't parse Perl either.

This is no Perl.

(But more on that at the OSCON talk) :)

Re:A few disclaimers are necessary there...

Alias on 2005-07-11T13:20:16

Certainly, but then neither can perl. When it really comes down to it, it is equally possible for perl to get it wrong, or act incorrectly.

And since there _is_ no language definition by which to define Perl, we simply say that "What perl parsers is what Perl is".

What is the definition of sub'new {}?

It's sub main::new {}. Is that a bug? It can't possibly be, because Perl is what perl parses.

If you look beyond the issue of whether PPI "parses" Perl correctly, you'll see that "there is no spoon".

There is no Perl, and everything (including perl itself) is merely doing something based on it's approximation of what it thinks Perl is.

PPI "parses" a fixed-length array or bytes of something that by most people's definitions is "Perl" into a document model based on only syntax.

It can't execute it or handle code as a stream.

perl "parses" a stream of bytes of something that by most people's definition is "Perl" and executes it as it goes, some of the time assembling byte code or symbol tree entries.

perl can't understand it as a document and it can't handle it as a fixed-length array.

So now there are two approximations of Perl. The one perl does, and the one that PPI does. The big question was, can we get the two close enough to make a difference.

And the answer is yes.

Re:A few disclaimers are necessary there...

Alias on 2005-07-11T14:05:41

>> 100% "Round Trip" safety

>... provided you don't change the document in any >way, like perhaps "strip comments" or anything that >is not the NULL transformation.

And BTW, so I don't have to explain this again, the definition of "Round Trip" is that if you don't change a piece of code, it comes out the same as it went in, the entire point is that you don't change the document.

PPI can do this, unlike every other attempt at a perl-based "parser" that I've seen.

B:: is more of an attempt to create a bytecode serializer that a Perl parser.

Congratulations

sky on 2005-07-10T15:58:58

A very cool mass of work.

Three Cheers

n1vux on 2005-07-11T20:31:48

I don't know what I'm going to use this for, but I want to use it for something ... maybe the refactoring editor will help me with some stuff I wish I hadn't written ...