Finding the big files in MiniCPAN

brian_d_foy on 2007-02-28T21:29:32

The mini-CPAN, a smaller version of the Comprehensive Perl Archive Network that includes just the latest versions and excludes a few big things, is now about 700 MB on my machine. That means that it can't quite fit onto a single CD, at least without removing parts of it. That's no good.

I've been playing with GrandPerspective, a Mac OS X utility to show a tree map of a directory to easily show where the big files are. Here's the map for my /MINICPAN:

Here's the image link: mini cpan tree map.

The big files represented by the tan section in the lower left are BioPerl, Most of the other big boxes are parrot in various releases, but from different authors (so maybe my minicpan script needs to recognize the multi-author situations to remove old versions.). I'm not sure why these are in my minicpan:

$ find . -name "*parrot*" 2>/dev/null | xargs du -h
3.6M    ./authors/id/C/CH/CHIPS/parrot-0.4.7.tar.gz
528K    ./authors/id/J/JG/JGOFF/parrot-0.0.5.tar.gz
6.5M    ./authors/id/J/JG/JGOFF/parrot-0.0.8.1.tgz
756K    ./authors/id/J/JG/JGOFF/parrot-0_0_7.tgz
8.6M    ./authors/id/L/LT/LTOETSCH/parrot-0.1.2.tar.gz
2.6M    ./authors/id/L/LT/LTOETSCH/parrot-0.2.1.tar.gz
2.8M    ./authors/id/L/LT/LTOETSCH/parrot-0.3.1.tar.gz
2.8M    ./authors/id/L/LT/LTOETSCH/parrot-0.4.1.tar.gz
3.1M    ./authors/id/L/LT/LTOETSCH/parrot-0.4.5.tar.gz
3.7M    ./authors/id/P/PA/PARTICLE/parrot-0.4.8b.tar.gz
3.8M    ./authors/id/P/PM/PMIC/parrot-0.4.9.tar.gz
6.8M    ./authors/id/S/SF/SFINK/parrot-0.0.10.tar.gz
7.0M    ./authors/id/S/SF/SFINK/parrot-0.0.11.2.tar.gz
6.7M    ./authors/id/S/SF/SFINK/parrot-0.0.9.tar.gz
192K    ./authors/id/S/SI/SIMON/parrot-0.0.3.tar.gz
436K    ./authors/id/S/SI/SIMON/parrot-0.0.4.tar.gz


I've also been thinking about the idea of a user-defined filter for minicpan so it can exclude things I know I don't want.


filters in minicpan

rjbs on 2007-02-28T22:29:41

minicpan (and CPAN::Mini) have filters -- they're just not well documented in the minicpan man page. I apologize, and will try to rectify this.

If you put "skip_perl: 1" in your .minicpanrc, it will not mirror perl, ponie, or parrot.

If you include an entry for path_filters or module_filters, you can skip paths or modules. They're interpreted as whitespace-delimited regular expressions -- although I admit I don't use any myself, anymore, and the specifics have escaped me. I think this is about right:

  path_filters: RJBS INGY \d_\d+