I am writing a little spider application, using LWP and all of that good stuff. For my particular application I need to set the referer header, and along the way I collect the right URLs to put in that.
Since I am using LWP, URLs tend to show up as objects, but when I try to put them back into an HTTP request, things blow-up:
use HTTP::Request;
use URI;
my $url = URI->new( 'http://www.example.com' );
my $request = HTTP::Request->new( "http://www2.example.com" );
$request->referer( $url );
The referer() method comes from HTTP::Headers, and all it does is pass its arguments to the _headers() method. Inside the headers method, that $url ends up in $val, and then it has to run the gauntlet:
[HTTP::Headers, 1.43 sub _headers]
if (defined($val)) {
my @new = ($op eq 'PUSH') ? @old : ();
if (!ref($val)) {
push(@new, $val);
} elsif (ref($val) eq 'ARRAY') {
push(@new, @$val);
} else {
Carp::croak("Unexpected field value $val");
}
$self->{$lc_field} = @new > 1 ? \@new : $new[0];
}
The thing in $val is defined, so it makes it into the block, but it is a reference, but not an ARRAY reference, so it falls through to the else{}. This works for most things, because _headers is a generic method, but referer could be a bit smarter.
[HTTP::Headers, 1.43, referer()]
sub referer { (shift->_header('Referer', @_))[0] }
Debugging this is was a pain. The URI objects automatically stringify, so printing them just shows the string form, rather than something like "URI=HASH(0xfb748)". My usual debugger, print(), fails to pick this up.
There are a couple of ways around this, none of them satisfying:
- Interpolate into new strings for each use, i.e. "$url".
- Check to see if the $url is a reference, then call the as_string method if it is.
- Always turn things into strings, losing the ability to call methods.
Oh well, now you know. Do not pull your hair out over this one, because I already did.
print and Data::Dumper
I find print much more useful when used with Data::Dumper. Whenever I program in something other than Perl (my job is mostly C++ coding), I find myself missing Data::Dumper, and recreating it in limited ways.
I much prefer visually scanning through Dumper($foo) to clicking through some elaborate tree view in a GUI debugger.
Re:print and Data::Dumper
brian_d_foy on 2003-12-01T18:12:29
For some reason ptkdb was failing with wierd errors when it got to the point of the problem, so I was not using that.
I was using Data::Dumper in a lot of places, but by the time I thought to see what was in the scalar variable (usually not a candidate for a Dumper() call), I knew what the problem was.
Indeed, there were all sorts of signs of what was happening, and everything got clouded because my starting point was wrong: URI objects will always do the right thing with LWP, but that was not the case. :)
Threads doesn't like URI's object stuff
petdance on 2003-12-01T16:42:47
It seems that the autostringification gets hung up on threaded Perls. I had to go through all Mech use of URI and make sure I was explicitly calling
->as_string().
Re:Threads doesn't like URI's object stuff
brian_d_foy on 2003-12-01T18:15:40
Was it the overloading that was the problem, or the things trying to use the objects? I do not use a threaded perl, so I have not paid much attention to its gotchas.
This is fixed in HTTP::Headers 1.47
brian_d_foy on 2003-12-01T19:59:52
Gisle tells me that this was fixed in LWP-5.66, and so it was, at least for my problem.
I thought I had updated LWP when I got home, but that is what I get for thinking.
:)