CPANDB should now be synchronised and up to date

Alias on 2009-07-06T06:32:16

Update:

The CPANDB now also contains the data for CPAN Ratings

In my previous announcements for CPANDB, I mentioned that while it was generating the database the correct way, the data was going to be a bit out of sync because data was coming from different places.

This should now be resolved.

The server syncs a minicpan to http://cpan.cpantesters.org/, then a META.yml database is updated from the minicpan, a CPAN::SQLite index is built from the same minicpan, and then both of them are joined together with the ORDB::CPANUploads database to product the completed tables.

The final step is the application of the graph algorithms to produce the weight and volatility scores for each distribution, which are now being added to the database as well.

Now that cpandb is ready for the big time, I'll start to try and merge in the three main outstanding data sets (CPAN Ratings, rt.cpan.org and CPAN Testers) and then rewrite the CPAN Top 100 to turn it into a fully operational death star... I mean website.

You can use the CPANDB module to fetch and ORM-inflate the database, or to download the database directly, use the following URL.

http://svn.ali.as/db/cpandb.gz

The current schema for the database now has the slightly expanded distribution table.

CREATE TABLE distribution (
	distribution TEXT NOT NULL PRIMARY KEY,
	version TEXT NULL,
	author TEXT NOT NULL,
	release TEXT NOT NULL,
	uploaded TEXT NOT NULL,
	weight INTEGER NOT NULL,
	volatility INTEGER NOT NULL,
	FOREIGN KEY ( author ) REFERENCES author ( author )
)


are you still using cpants.db?

domm on 2009-07-06T11:00:53

I sort of missed if you're doing the indexing by yourself by now, or of you're still pulling in the data generated by cpants...

Re:are you still using cpants.db?

Alias on 2009-07-06T12:25:15

I'm now doing it all from scratch myself.

I started from the CPAN Index, so while it's superficially like the CPANTS graph, it follows permissions properly, and correctly handles changes of authors.

Re:are you still using cpants.db?

Alias on 2009-07-06T12:30:27

See CPANDB::Generator for the methodology used to generate the database.