At the recent QA Hackathon in Birmingham (which was great - many thanks to Birmingham.pm for organising it, and to the hotel for having a rubbish network so I actually worked instead of dicking around on IRC) I worked on my idea for the Comprehensive Perl $version Archive Network, or CPXXXAN for short, mostly so I can make jokes about hot module-on-module action.
It is intended to be a set of CPAN "mirrors" which only contain distributions which pass their tests on a particular version of perl, with their 02packages index files set up correctly so that if, eg, you go to the mirror for perl 5.6.2, you'll get DBI 1.604, which works, instead of DBI 1.607, which doesn't.
While you might think that this is pointless - the argument is that "people who don't upgrade perl won't want to upgrade modules" - consider that recent releases of some distributions now require perl 5.10, even though that's only just over a year old. 5.6.2 just makes a convenient test case.
In Birmingham, I tested it and got it all working on my laptop, using a subset of the data (specifically, distributions beginning with DBI or DBD). Now I'm building the 5.6.2 index for the whole of the CPAN. It's taking a very long time, but I'm not that concerned, as I'm unlikely to update the indices more than once a month. Even so, any Clues about how to make my SQL more efficient would be most welcome. See the schema and the query that figures out what the most recent passing distribution is and what modules are in it.
All the code if in my git repo, email me if you want access.
From 'importtestresults.pl' I see that only dists with 'PASS' are imported in. This means that dists with external prereqs (for ex. XML::Parser - it requires expat with header files), dists with problems of prerers (for ex. current version of prereq does not work on this version of perl) are not included.
Re:dists without 'FAIL' and 'PASS'
drhyde on 2009-03-31T23:56:52
The whole point is that we know all the modules listed work. It would be silly to import all the failure reports as well, only to ignore them!
As it happens, various incarnations of the XML-Parser distribution have passed their tests on 23 different versions of perl. As one of them is perl 5.6.2, then XML::Parser will be in the list. Sure, you need an external C library. Not only can we not detect that, but it would be silly to exclude everything that has a non-perl dependency. If we were to do that we'd have to exclude WIn32::* because they depend on Windows, which lots of people don't have!
The issue with pre-reqs is a problem, but one that a perl version-specific mirror can over time, help with. Let's assume that DBIx::Garbleflux is released today, requires DBI, but isn't fussy about what version. Normally, it would never even get tested on 5.6.2, because the testers' CPAN.pm can't automatically retrieve a working version of DBI. But if the tester's perl is pointing at my 5.6.2-specific mirror, then it can! The shiny new untested DBIx::Garbleflux is available to test, as A/AU/AUTHORID/DBI-Garbleflux-1.0.tar.gz, from my mirror (which is simply a BackPAN with an index layered on top) so testers' machines will attempt to install it. CPAN.pm will see that there's a dependency on DBI - the module called DBI, not any particular distribution of DBI - and look it up in the index. The index will tell it to download and install T/TI/TIMB/DBI-1.604.tar.gz (the last version of DBI to pass its tests of 5.6.2) and so the shiny new DBI::Garbleflux will magically get tested with the right rusty old versions of its pre-requisites, and (provided that the author got everything right) a test PASS will be reported, which will eventually get back to the CPXXXAN and thus the module will be available to normal users of obsolete perl.
Hi
Perhaps it would be faster to import the data into Postgres, and then use a trivial script to copy the data to SQLite.
Cheers
Ron
Re:Speedup
drhyde on 2009-04-01T10:36:49
Importing the data isn't the problem, it's the query in build02packages.pl.