Net::Amazon::S3 + ??? = High-Value Data Backup Tool

Alias on 2006-03-22T18:06:44

With the release of Amazon's new S3 storage service, I can only say "it's about time".

Personally, I don't understand why people have had such a negative reaction to it. Most of the bitching has centred around "But you have to pay for transfers!". And to be honest, for people who's first reaction is to wonder if they can back up their entire computer, it would indeed be silly.

But lets look at what we have here.

A massively scalable, enterprise grade, (presumably) disaster-resistant, and most importantly outsourced and you-don't-have-to-think-about-it backup solution. And I have to ask do I honestly want to put 30 gigabytes of ripped CDs into a backup solution of that quality?. For me the answer is no, it would just be overkill.

But looking at what I might ACTUALLY want to put into a backup system of that grade, a quick check of my main development servers reveals the following.

red cvs : 716M red svn : 498M ali.as cvs : 149M other svn : 100M (or thereabouts)

So to use S3 as my weekly off-site backup for all my code, all my client's code, and my entire business documents repository (I scan my bills and such) and various other bits and pieces is going to cost me $3 to put it in, then maybe another $3 in changed data each week over a year, and $2 per year.

And to throw in my gig or so of digital photos might cost me another couple of bucks. (Of course, I tend to take photos rarely just for family events and the odd amazing experience, and not in insanely high resolution. Certain other people that have 10s of gigs of photos might not get off as cheaply).

So while S3 might not be what the kiddies use to backup their 250 gig drive of illegal torrented DVDs, it looks like amazing value for backing up your high-value data. For most people this is going to mean you can disaster-proof all your code for less than $1 a year. You'd have to be crazy NOT to use it.

Thanks to Lean Brocard, who seems to me to be always be CPAN's responsive-to-current-events master, after only a week we now have a Net::Amazon::S3 module to handle the actual transfer to and from Amazon.

So all we really need now is something that wraps around it.

I'd write it myself, but since my brain is currently overworked as it implementing a dozen asynchronous protocols to handle doing something so different right now, here's my specification for a simple S3-based backup system. Implementation is left as an exercise for the lazyweb :) (feel free to co-ordinate any efforts in the comments of this post)

1. A Module with a constructor that takes S3 account information.

Amazon::S3::Backup->new( $account, $password );

2. List, and addition of new directories, keyed by arbitrary names.

For example

$backup3->add_directory( "my_crappy_old_cvs" => '/var/lib/cvs' );

Metadata stored in a small YAML or SQLite file in a define location without each bucket (the top level unit of storage for S3, and roughly analagous to a "directory" here).

3. An update routine that MD5s (or whatever) the files in the directory and then synchronises the remote S3 data to match the local one.

4. Optional emailing or SMS (perhaps via SMS::Send?) that the backup executed, or failed, or what have you.

And that's about all you'd need to get started.

So get to it Lazyweb! I want to back up my stuff using S3 :)

And note to Perl's PR people. Consider bribing people to make this work. A backup tool released only 2 weeks after S3 is announced would be a great advert for Perl.

Also probably a pretty guarenteed way for some aspiring programmer to get themselves slashdotted, especially if they beat all the other languages. :)


Brackup

Aristotle on 2006-03-22T18:41:51

Brad has already written that.

Re:Brackup

Alias on 2006-03-22T19:56:36

The lazyweb comes through again!