This prints out a list of recent modules, with no duplicates.
#!/usr/local/bin/perl5.8.0 -- # -*- perl -*-
use warnings;
use strict;
use LWP::Simple;
use HTML::TreeBuilder;
my $date = shift; # yyyymmdd
my $url = "http://search.cpan.org/recent/";
$url .= "?d=$date" if $date;
my $recent_html = get($url);
my $recent = HTML::TreeBuilder->new_from_content($recent_html);
my $links = $recent->extract_links;
my %modules;
foreach my $link (@$links)
{
my $linkval = $link->[0];
next unless $linkval =~ m{^/author/.};
$linkval =~ m|/([^/]*)/$|;
my $dist = $1;
my @dist = split "-", $dist;
pop @dist while @dist and $dist[-1] =~ m/^[.\d_]*$/;
warn unless @dist;
my $module = join "::", @dist;
next if $modules{$module};
print "$module\n";
$modules{$module} = 1;
}
Update 10:20 A.M., CST: Underscores are allowed in version numbers.
Re:the underscores are only
jdavidb on 2002-11-22T18:14:03
True, and good to point out. The code I have just uniq's module names without regard to version; others might prefer to have them removed from the list.
My purposes were to answer "What's happening on CPAN?" for which alpha modules are a valid answer. If someone uses this to automatically update modules for a NYSE system or something, they might get bad results.
;)
my $recent_html = get($url) || die "Can't get $url\n";
I'm glad to see folks are using new_from_content tho! Proof that it's as handy as I thought it'd be.
Re:another way of doing it
merlyn on 2002-11-23T00:44:46
/me bangs head on desk
If you are mirroring a local mini-CPAN with my program, you can also browse that mirror locally.