One thing I've been wanting to do for ages is to have a muck around with the CPANTS dataset that is conveniently provided by the cpants.perl.org website.
The main thing stopping me has been the lack of an easy programmatic way to work with the data.
With the basic first implementation of ORLite done, the obvious next step to take is to enhance it to suck the SQLite data in from various places.
ORLite::Mirror is an ORLite subclass that mixes in LWP and a few other utility modules to allow the loading of the database from any arbitrary URL that LWP supports.
It supports both regular database files, and any compressed databases that have a URL ending with .gz.
What this means, of course, is that now I can load the CPANTS database without having to actually build the object model myself.
So now you can do stuff like the following...
#!/usr/bin/perl
# Create an ORM model for the CPANTS database.
use ORLite::Mirror {
url => 'http://cpants.perl.org/static/cpants_all.db.gz',
package => 'CPANTS',
};
my $count = CPANTS::Author->count;
print "CPANTS currently tracks $count authors\n";
my $authors = CPANTS::Author->select('where pauseid = ?', 'ADAMK');
print "ADAMK is " . $authors->[0]->name . "\n";
So if anyone else just happens to control any large chunks of CPAN-related data, it would be just awesome if you could periodically (by which I mean cron) publish that dataset to some stable URL where interested people could get hold of it :)
With enough of the data exposed in this way, it should be relatively trivial to build a module that pulls down all of the different data sets, and lets you write analysis algorithms that span across different data sets.
Re:CPAN Testers
Alias on 2008-05-29T08:05:26
But alas, bz2... I guess that's another feature to add...
Re:cpan.uwinnipeg.ca database
Alias on 2008-05-31T07:48:40
The more databases, the merrier.
To be honest, I'm not entirely surely which databases will be useful, but once we have a large number of DBs available, it expands the opportunities for writing code.
The bits that will continue to be useful will likely emerge over time.