Of the various things in the PPI Phase 2 grant, the most interesting one for my is Perl::Metrics.
Perl::Metrics is a wonderful magical Class::DBI/SQLite database combined with a plugin-based document processor.
The point of this wonderful beast is to process files into their PPI::Document and generate a wide variety of metrics (values that tell you something about the documents) and storem in a structured and relatively efficient manner.
So what can we do it? heh, I really don't have any idea.
Well that's not entirely true. What you do with it is write yourself a 10 line plugin, or a 100 line plugin, and then let it process your entire company repository, or all of CPAN. And it will cache everything properly, and only generate metrics for duplicate files once, and handle anonymous documents, and handle metrics that are versioned and need to be expired when new versions come along, and a dozen other things that make processing vast amounts of documents trivial.
And you can install as many plugins as you like and it will handle the processing for ALL of them properly.
The basic idea is to write a series of plugins to gather all sorts of interesting data. And then some people we haven't met yet with skills in statistics, reporting and artificial intelligence will take this vast amount of metric data and tell us AAAAAAALL sorts of interesting things about documents.
To demonstrate how easy it is to write it is, I've done a standalone plugin for you to look at.
Perl::Metrics::Plugin::MinimumVersion provides a fairly straight forward integration of Perl::MinimumVersion into the Perl::Metrics architecture.
So after being installed, whenever you run the main ->process_index method, you get updated minimum Perl versions metrics gathered with whatever else is being processed.
So my question to you today my dear friends, what sort of things can you think of taking this and doing? Got any statistics experience? :)