CPAN Clean Sweep 2006

brian_d_foy on 2006-01-24T04:28:59

It's a new year, so it's time to reduce the size of CPAN.

A long time ago, I was really interested in measuring the growth of CPAN, and comparing that to the size of MINICPAN, which I called the Schwartz Factor.

I'm interested again because CPAN just filled up an old laptop disk and my FreeBSD machine got really concerned about where it would put files. CPAN is over 3 gigs. Yowsers! We can get under 3 if people can delete about 150 Mb of old versions.

Some of that size, however, is cruft and old versions of modules. Delete those older files! You'll still find them on BackPAN, so you don't need to keep them in CPAN. :)


Effect of previous call to increase your Schwartz

mr_bean on 2006-01-24T05:06:32

I'm wondering how much effect the previous call to clean up CPAN had. Because although that message, 'Increase your Schwartz' http://use.perl.org/user/brian_d_foy/journal/8314 is interesteding to read, there are no comments to it.

My guess is that the CPAN authors who leave lots of old versions around are also those who don't read use.perl journals.

Re:Effect of previous call to increase your Schwar

brian_d_foy on 2006-01-24T05:11:40

I forget the numbers, but as I recall we shaved a couple of percent off the size. No matter how we did then, maybe we can do better this time. :)

Is there any way to detect the worse offenders?

Alias on 2006-01-24T06:19:34

Can we tell who the worst offenders are? Surely there's a dozen people or who have never deleted files and released dozens or hundreds of versions of things,.

Re:Is there any way to detect the worse offenders?

brian_d_foy on 2006-01-24T06:25:08

Sure, we could just go through the author directories and count up the byte size of all of the old distros. Maybe I'll hack up that script; I need something for next month's The Perl Journal. :)

I'm not about shaming people though, and some people might have good reason to keep old versions around.

Re:Is there any way to detect the worse offenders?

Alias on 2006-01-24T07:32:24

Well, it's never _about_ shaming people.

It's about providing more information for people. When a new batch of information becomes available (like the CPANTs stuff) there will be a number of people that totally won't have known they were doing the wrong thing, and will quite quickly move to fix it.

And those that don't care will never see.

Just don't make a competition out of it. :)

If you were to list people based on "inverted Schwatz factor" then you could just list it for the good people as "3 or less", to avoid people religiously culling modules so hard that we don't even have one previous version available.

Archiving?

sigzero on 2006-01-24T12:58:07

Couldn't some kind of auto-archiving take place, where things get moved to BackPAN?

Done.

jk2addict on 2006-01-24T13:41:56

About 80 files gone. Does CPAN/CPANPLUS search backpan when installing modules? If so, maybe it's time to just make this an automatic thing?

Keep at least one old version, please!

ChrisDolan on 2006-01-24T14:46:10

It would be helpful if authors would keep at least one older version on CPAN:

1) The diff tool at search.cpan.org doesn't work against Backpan
2) Sometimes unexpected bugs or incompatibilities do crop up, and in those cases, it's nice to be able to use CPAN.pm to revert rather than having to go and manually download the tarball.

Re:Keep at least one old version, please!

jand on 2006-01-24T22:27:50

I totally agree with this! I would even argue that only releases over a year old should be deleted. Having the last couple of versions available for forensic investigations using the search.cpan.org diff tool is very valuable.

Disk space is cheap; 3GB is less than a single DVD. :)

CPAN mirroring

drhyde on 2006-01-25T12:36:43

Different subject entirely, but how are you maintaining your mirror? I used to use rsync to keep mind up-to-date, but it seems that pretty much all the mirrors I've tried rsyncing from recently are broken, so I never get back in sync.