Finally wrote everything about our hashing system

dwhite21787 on 2002-08-01T12:46:12

Woo hoo! Got one monkey off my back, finally.

I wrote a 12 page paper about how our distributed hash harvester actually works. It's not on the web yet, because it needs to go through a review process, but I hope it will be soon.

It's not the most spectacular use of Perl. Having the same piece of code running on 6 PCs doing loosely parallel hashing, monitored by a master cron job and a couple database tables isn't rocket science - rockets are someone else's job.

Using sexeger was the best improvement we've made so far. It halved the time we were spending in our slowest section of code, and lowered our overall run time by 1/3.

I plan on going to YAPC::Europe::2002 and I submitted an abstract; hopefully I can give a presentation, but if not I'll certainly want to pick up as much input as possible. There's so much more I should be doing with benchmarking, hashing and ODBC modules.

More later, and hopefully a URL for the paper in the NSRL website.


Interesting...

Matts on 2002-08-01T14:11:56

Sounds very interesting - I wrote something here at MessageLabs that sounds similar (though I'll know for sure when I read your paper) that does distributed hashing of emails so we can monitor how many of email X we've seen, and also have a global db of known spams - sort of like Razor, but not crap ;-)

I'd love to get a look at it.