gitPAN is complete!

schwern on 2009-12-14T22:27:48

The gitPAN import is complete.

From BackPAN ------------ 118,752 files 10,440,348,937 bytes (measured by adding individual file size) 21987 distributions (I skipped perl, parrot and parrot-cfg)

To git ------ 21,766 repositories 4,495,204 bytes (measured by total disk usage after git gc with no checkout) 150 gigs on github (they have to index it) 12 days (lots of starts and stops) 1 laptop (1st gen Macbook)

I had to do it on a disk image because OS X's case-insensitive filesystem

I've written up a small FAQ. gitpan is reasonably stable, but you may have to rebase in the future.

Next, I take a break.

Then begins the second pass, mostly improving and adding tags. Here's the list of planned features. The second pass will be a rolling reimport of each distribution to bring everything up to the same standard, there was a lot of incremental improvements during the first pass. I expect this to be changes to commit logs and tags with very little content change.

The issue of PAUSE ownership I'm going to punt on. Its ugly and can be done entirely in parallel. If someone else makes available a historical distribution ownership database, gitPAN will use it.


Ohloh and friends?

Alias on 2009-12-14T23:59:03

So now we just need to submit those 20,000 projects to Ohloh and other source metrics'y places? :)

Thank you

jarich on 2009-12-15T13:43:35

Thank you for doing this. This was a huge project and I think it's awesome.

Fostering Laziness :-)

rjray on 2009-12-17T06:27:03

I've been meaning to put Perl-RPM onto git for a while, now, and been mostly held back by the fact that I never converted it from CVS to Subversion. Now that you have as much history as I could realistic need, is there an easy way to just outright seed a new repo from the gitPAN repo? Not just filling in holes like you did with David, but essentially cloning it (without it being treated as a repo clone)...

Re:Fostering Laziness :-)

schwern on 2009-12-17T11:20:06

Yes, you can clone it. There's nothing special about a clone. Just change where the "origin" remote points to (presumably to your own repository) and push.

$ git clone git://github.com/gitpan/Perl-RPM
$ cd Perl-RPM/
$ git remote rm origin
$ git remote add origin git@github.com:rjray/Perl-RPM.git
$ git push origin master

That should do it. This presumes you've created the Perl-RPM project on github.