Simple extraction of links from web page

scot on 2007-01-02T21:35:02

It took me far longer than I thought it would to come up with this code that grabs a web page and stuffs all the page's hyperlinks into a text file.

Updated...

use strict; use WWW::Mechanize; #usage perl linkextractor.pl http://www.example.com/ > output.txt my $url = shift; my $mech = WWW::Mechanize->new(); $mech->get($url); my $status=$mech->status(); print $status." OK-URL request succeeded."."\n"; my @links = $mech->links; print STDOUT ($_->url, $/) foreach $mech->links;

<\code>


Try linktractor

brian_d_foy on 2007-01-03T04:49:10

I've now made this much easier in HTML::SimpleLinkExtor 1.14:

linktractor -f=http://www.example.com > output.txt
No need to work too hard, after all. :)

Re:Try linktractor

scot on 2007-01-03T17:03:06

Thank you. Can you see any obvious snafu's in the following code?:

use strict;
use warnings;
use HTML::SimpleLinkExtor;
use WWW::Mechanize qw( );

#usage linkextractor -f http://www.example.com/ > output.txt

my ($url) = @ARGV;

my $mech = WWW::Mechanize->new();
my $response = $mech->get($url);
$response->is_success()
      or die($response->status_line() . "\n");

my $extor = HTML::SimpleLinkExtor->new();
$extor->parse($response);
my @all_links = $extor->links;
foreach my $elem (@all_links) {
        print STDOUT;
}

Re:Try linktractor

brian_d_foy on 2007-01-03T17:21:46

I'm not really sure where to start with that or if you're serious, considering the code doesn't work.

Re:Try linktractor

scot on 2007-01-03T22:46:54

Please disregard that last post of mine. My apologies.