Web::Scraper hacks #3: Read your browser's cookies

miyagawa on 2007-10-25T19:18:43

Some websites require you to login to the site using your credential, to view the content. It's easily scriptable with WWW::Mechanize, but if you visit the site frequently with your browser, why not reusing the browser's cookies, so as you don't need to script the login process?

Web::Scraper allows you to call methods, or entirely swap its UserAgent object when it scrapes the website. Here's how to do so:

use Web::Scraper;
use HTTP::Cookies::Guess;

my $cookie_jar = HTTP::Cookies::Guess->create(file => "/home/miyagawa/.mozilla/cookies.txt"); my $s = scraper { }; $s->user_agent->cookie_jar($cookie_jar); $s->scrape($uri);


This snippet uses HTTP::Cookies::Guess which provides you a common API to read browser's cookie files (the module supports IE, Firefox, Safari and w3m) and set the cookie jar to the UserAgent object.

If you'd like to change the behavior globally, you can also do:

$Web::Scraper::UserAgent->cookie_jar($cookie_jar);


In either way, you can avoid coding your username and password in the scraping script, which is a huge win.