Automatic thumbnails of web sites

LTjake on 2008-02-22T15:23:14

At $work, we decided that we kind of liked the idea of showing thumbnails of the sites we've linked in a particular section of our site. Naturally, there are existing services that can be used to do this. However, we weren't too hip on relying on those services.

We decided to try and make our own thumbnail service. We took an old machine, put Ubuntu (7.10) on it, and created a fairly simple script to control Firefox and generate the screenshots. It uses X11::GUITest to do the automation and Imager::Screenshot to process the screen:

use strict;
use warnings;

use Imager::Screenshot ();
use X11::GUITest       ();
use Digest::MD5        ();
use CGI                ();

my $start   = '~/blank.html';
my @urls    = load_config( shift );
my $destdir = shift;
my $id      = close_and_reopen_firefox();

my $count = 0;
for my $url ( map { CGI->unescapeHTML( $_ ) } @urls ) {
    # skip "comments"
    next if $url =~ m{^#};
    # skip existing screenshots
    next if -e gen_filename( $url );

    load_page( $url );
    sleep( 20 );
    take_screenshot( $id, $url );

    # reload firefox after 100 urls
    if ( ++$count == 100 ) {
        $id    = close_and_reopen_firefox( $id );
        $count = 0;
    }
}

# close firefox
X11::GUITest::SendKeys( "%({F4})" );

sub gen_filename {
    my $url = shift;
    return "${destdir}/" . Digest::MD5::md5_hex( $url ) . '.jpg';
}

sub take_screenshot {
    my $id  = shift;
    my $url = shift;
    my $i   = Imager::Screenshot::screenshot( id => $id );

    # remove the scrollbar + scale
    $i = $i->crop( right => $i->getwidth - 15 );
    $i = $i->scale( xpixels => 150 );
    $i->write( file => gen_filename( $url ), jpegquality => 75 );
}

sub load_page {
    my $url = shift;
    X11::GUITest::SendKeys( '%({LEF})' );    # go "back"
    X11::GUITest::SendKeys( "^(l)" );
    X11::GUITest::WaitWindowViewable( 'Open Web Location' );
    X11::GUITest::SendKeys( "${url}{ENT}" );
}

sub close_and_reopen_firefox {
    my $id = shift;
    if ( $id ) {
        X11::GUITest::SetInputFocus( $id );
        X11::GUITest::SendKeys( "%({F4})" );
    }

    X11::GUITest::StartApp( "firefox $start" );
    ( $id ) = X11::GUITest::WaitWindowViewable( 'Mozilla Firefox' );
    sleep( 2 );
    X11::GUITest::SetInputFocus( $id );
    X11::GUITest::SendKeys( "{F11}" );

    return $id;
}

sub load_config {
    my $file = shift;

    open( my $data, $file );
    my @urls = split(
        "\n",
        do { local $/; <$data>; }
    );
    close( $data );

    return grep { length } @urls;
}

Now, you might have noticed that we close firefox after 100 urls -- in reality, it never gets that far. Things seem to segfault around 25 urls in. I don't particularly understand why it's so unstable. We've disabled the "session recovery" feature so firefox won't get stuck asking questions on startup, plus fast back<->forward history rendering in case it was leaking memory.

Hopefully someone will find this bit of code useful, and perhaps someone has some ideas as to how we can make this setup a little more stable.