CPANDB 0.02 - Now we're starting to get somewhere

Alias on 2009-07-01T17:17:06

On the back my my improved and high-coverage CPAN::Mini::Visit and Archive::* fixes, I've finally managed to build a complete-coverage version of CPANDB.

CPANDB is a merged and cleaned up schema that combines the CPAN index, the "CPAN Uploads" database (for PAUSE upload dates), and both class and distribution level dependency information held in META.yml files (replacing the CPANTS dependency graph).

To take a look at it, you can grab a copy of the SQLite database directly from the following URL.

http://svn.ali.as/db/cpandb.gz.

The data sources used to generate it are not perfectly time-synced yet, so I expect to see a few minor flaws for another release or two. But compared to everything else available (from pretty much everybody) this should be a significant improvement.

As well as clearing up the last tiny data quality issues, I'm also yet to merge in the rt.cpan.org database (which is almost ready) and the CPAN Ratings database (which is a text file I really don't want to have to parse).

But don't let this stop you trying it out now (I've appended the schema to the bottom of this post so you can get a clearer idea of what's in there).

As usual, feedback is welcome.

CREATE TABLE author (
	author TEXT NOT NULL PRIMARY KEY,
	name TEXT NOT NULL
);

CREATE TABLE distribution ( distribution TEXT NOT NULL PRIMARY KEY, version TEXT NULL, author TEXT NOT NULL, release TEXT NOT NULL, uploaded TEXT NOT NULL, FOREIGN KEY ( author ) REFERENCES author ( author ) );

CREATE TABLE module ( module TEXT NOT NULL PRIMARY KEY, version TEXT NULL, distribution TEXT NOT NULL, FOREIGN KEY ( distribution ) REFERENCES distribution ( distribution ) );

CREATE TABLE requires ( distribution TEXT NOT NULL, module TEXT NOT NULL, version TEXT NULL, phase TEXT NOT NULL, PRIMARY KEY ( distribution, module, phase ), FOREIGN KEY ( distribution ) REFERENCES distribution ( distribution ), FOREIGN KEY ( module ) REFERENCES module ( module ) );

CREATE TABLE dependency ( distribution TEXT NOT NULL, dependency TEXT NOT NULL, phase TEXT NOT NULL, PRIMARY KEY ( distribution, dependency, phase ), FOREIGN KEY ( distribution ) REFERENCES distribition ( distribution ), FOREIGN KEY ( dependency ) REFERENCES distribution ( distribution ) );