all recent modules

jdavidb on 2002-11-22T15:47:44

This prints out a list of recent modules, with no duplicates.

#!/usr/local/bin/perl5.8.0 -- # -*- perl -*-

use warnings;
use strict;

use LWP::Simple;
use HTML::TreeBuilder;

my $date = shift;  # yyyymmdd
my $url = "http://search.cpan.org/recent/";
$url .= "?d=$date" if $date;

my $recent_html = get($url);

my $recent = HTML::TreeBuilder->new_from_content($recent_html);

my $links = $recent->extract_links;

my %modules;
foreach my $link (@$links)
{
    my $linkval = $link->[0];
    next unless $linkval =~ m{^/author/.};
    $linkval =~ m|/([^/]*)/$|;
    my $dist = $1;
    my @dist = split "-", $dist;
    pop @dist while @dist and $dist[-1] =~ m/^[.\d_]*$/;
    warn unless @dist;
    my $module = join "::", @dist;
    next if $modules{$module};
    print "$module\n";
    $modules{$module} = 1;
}

Update 10:20 A.M., CST: Underscores are allowed in version numbers.


the underscores are only

hfb on 2002-11-22T16:43:30

for alpha module releases.

Re:the underscores are only

jdavidb on 2002-11-22T18:14:03

True, and good to point out. The code I have just uniq's module names without regard to version; others might prefer to have them removed from the list.

My purposes were to answer "What's happening on CPAN?" for which alpha modules are a valid answer. If someone uses this to automatically update modules for a NYSE system or something, they might get bad results. ;)

get() checking

TorgoX on 2002-11-22T20:36:07

One change I suggest:

my $recent_html = get($url) || die "Can't get $url\n";

I'm glad to see folks are using new_from_content tho! Proof that it's as handy as I thought it'd be.

another way of doing it

merlyn on 2002-11-23T00:43:32

If you are [http://www.stonehenge.com/merlyn/LinuxMag/col42.html|mirroring a local mini-CPAN with my program], you can [http://www.stonehenge.com/merlyn/LinuxMag/col43.html|browse that mirror locally] as well.

Re:another way of doing it

merlyn on 2002-11-23T00:44:46

/me bangs head on desk
If you are mirroring a local mini-CPAN with my program, you can also browse that mirror locally.