Simple extraction of links from web page

scot on 2007-01-02T21:35:02

It took me far longer than I thought it would to come up with this code that grabs a web page and stuffs all the page's hyperlinks into a text file.

Updated...

use strict; use WWW::Mechanize; #usage perl linkextractor.pl http://www.example.com/ > output.txt my $url = shift; my $mech = WWW::Mechanize->new(); $mech->get($url); my $status=$mech->status(); print $status." OK-URL request succeeded."."\n"; my @links = $mech->links; print STDOUT ($_->url, $/) foreach $mech->links; <\code>








Try linktractor
brian_d_foy on 2007-01-03T04:49:10
I've now made this much easier in HTML::SimpleLinkExtor 1.14:
 linktractor -f=http://www.example.com > output.txt
 
 No need to work too hard, after all. :)





Re:Try linktractor
scot on 2007-01-03T17:03:06
Thank you. Can you see any obvious snafu's in the following code?:

use strict;
use warnings;
use HTML::SimpleLinkExtor;
use WWW::Mechanize  qw( );

#usage linkextractor -f http://www.example.com/ > output.txt

my ($url) = @ARGV;

my $mech = WWW::Mechanize->new();
my $response = $mech->get($url);
$response->is_success()

      or die($response->status_line() . "\n");

my $extor = HTML::SimpleLinkExtor->new();
$extor->parse($response);
my @all_links = $extor->links;
foreach my $elem (@all_links) {

        print STDOUT;
}







Re:Try linktractor
brian_d_foy on 2007-01-03T17:21:46
I'm not really sure where to start with that or if you're serious, considering the code doesn't work.





Re:Try linktractor
scot on 2007-01-03T22:46:54
Please disregard that last post of mine. My apologies.