Replication

Matts on 2001-12-17T11:02:48

For the work I'm doing on spam protection here at work I've written a sort of replicated database using POE, CDB files, and HTTP. You basically get to store spams in the database, and it gets automatically replicated to all peers. It works using POE's delay/alarm system - so we basically say that replication happens every N seconds. You can add entries to the database on any of the servers and it gets sent to the "master" who then replicates to everyone. The code is also identical on the master as on the clients, which is kinda cool.

All the requests are via HTTP, which supports my notion of SOAP sucking and how you should use HTTP instead of SOAP... So basically you request http://server/spamlist/, and it tells you if the email with that hash is spam (and if so, what heuristic hits it got). To say "Yes it's spam", I simply return a 200 OK with the body being the details, and a 404 Not Found if the hash wasn't in the database. You can also do a PUT on that URL, to store new spam hashes in the database. When you do the PUT it stores the new spam in a temporary hash (because CDB files aren't dynamically updatable), and also in a text file on disk. Every N seconds the replication kicks in, which works via POST requests to /incoming/spamlist on the master. It sends multiple spams at once, and recieves any spams it has to add to the local db in the response, or a 204 if there were no spams to add. I'm trying to make it as safe as possible - each client makes sure it doesn't do anything if the request fails (by checking for sig_pipe events), and will simply re-send on failure.

Ironically I had a thought - none of this *really* needs a replicated database - it could all just work using a single http server and http proxy servers on the peers. However I think it would be a bit higher latency in some cases, and we deal with a lot of email here, so latency is bad.

Oh, and all of this is to replace Razor, because Razor is really flaky (which is unfortunate). It's a shame I won't be able to open source this, but it's going to be a major business component I think, and so they'll want to hold onto that as IP here.

Oh, and it also supports white/black lists. However I'm not entirely sure how that will work, since the white/black listing will work on a per-company basis, so we'll need to use different databases for each company...