Fetching mailman archives

ziggy on 2005-05-23T14:44:24

I'm working on a problem at the moment, and I want to dive into some archives of a couple of mailman lists. However, I don't want to go spelunking through threads on a month-by-month basis, and I don't want to bang on a search box either. Better to just download the archives and load them into mutt.

There's a way to do this with wget or curl, but I can never remember which one it is, nor the command line options to fetch a page and download specific links from that page.

In any case, Andy's WWW::Mechanize::Examples was much easier to find and peruse.

Here's a script to start the process. It takes two parameters, the URL for the pipermail archive index page, and the directory to stuff the files it fetches. Concatenating monthly mailboxes and loading them up in your favorite mail reader is an exercise left to the reader. ;-)

#!/bin/env perl -w

use strict;

use WWW::Mechanize;
use File::Basename;

my $url = shift;
my $dir = shift;

mkdir $dir unless -d $dir;
chdir $dir or die "Can't cd into $dir\n";

my $mech = new WWW::Mechanize;
$mech->get($url);

my @urls = map {$_->url_abs()} 
               $mech->find_all_links(url_regex => qr/txt/);

foreach my $archive (@urls) {
    print STDERR "$archive\n";
    $mech->get($archive, 
               ":content_file" => basename($archive));
}