Day 1393: Grant approved! PPI 1.101 now out, with caching

Alias on 2005-09-28T18:16:09

So it would appear I get to spend some quality time again with everybody's second-favourite Perl parser.

As the starting point, I've finally spent the hours needed to complete the first major point release of PPI.

To keep things simple (and avoid this whole version.pm PBP debacle) I'm just going do my major increments (there shouldn't be many) in 100s. 10 major point releases, and 99 minor ones should be just fine, since I don't plan on packing too much more shite into the main PPI dist.

But one new thing I have packed in is integrated caching. It works something like this. use PPI; use PPI::Cache path => '/var/cache/ppi'; # code as normal

my $Document = PPI::Document->new($filename); # and more code as normal


and that's it. A one line addition at the top of a script to enable the integrated caching layer, and no change at all to your loading or saving code.

The caching is based on Storable and Digest::MD5, and really should Just Work. If you open a document PPI has seen before, it will pull the preparsed version from the cache. If you parse something it can keep, it will save a copy. And you should never have to know how it works :)

Now this does mean a little bit of dependency bloat (which I hate) but it only adds modules that are already very very common, so it _should_ be no extra effort in 99% of cases.

Please note that at the moment, nothing in the cache expires (options for this to be added in a future minor release). But unless your documents change rapidly, this shouldn't be a problem.

And did I mention I have plans for a standard cron-integrated cache-clearing module (although I'm sure something exists already)

Just be aware though that if you want to parse all of CPAN or something like that, it's gunna need a big-ass cache. I haven't tried it myself yet, but I would imagine it will bloat the cache out to a few gig. I'll try to get a number of the cache/code ratio some time soon.

BUT, it does mean you can cache all of CPAN in parsed form if you really need to, and that should bring down MASSIVELY the processing time needed for things like Kwalitee and other mass-processing tools :)

Next up, I'll be trying for a first release of Perl::Metrics, and shortly thereafter Perl::Metrics::Plugin::Critic, to integrate it into Perl::Critic. And that is where the fun REALLY starts.

As usual, please kick the tires and report any bugs or feature ideas to RT. Have a nice day!


Location also cached?

jplindstrom on 2005-09-28T19:31:37

Does it store the location of things in the cache if $doc->index_locations is called?

Re:Location also cached?

Alias on 2005-09-29T15:04:35

No it doesn't.

The document is cached ONLY at the instant that it is loaded. Otherwise what exists in the cache would not be deterministic. There could be all sorts of crap appear in the document you weren't expecting when you loaded it.

That said, ->index_locations isn't _too_ expensive to do per-load. It's only expensive if you needed to do it per ->location call.