Bzzzz

Matts on 2003-06-05T16:27:38

I feel like I have dropped off the public face of this planet for my latest set of deadlines. I've all but stopped my contributions to open source, including all my modules and AxKit too. This does not feel good, but something had to give and those weren't giving anything back to me so they were the first thing to go.

This week I am supposed to finish this quarantine system. Unicode bugs forced me to have to admit that it won't get done. Why can't email systems generate RFC compliant emails yet??? Ugh.

Beyond that I have to face up to the fact that I will probably have to lump the database into MS SQL Server for rollout. This will of course cause delays, because I've made PostgreSQL specific assumptions in the DB layer. Luckily there is only ONE layer there to fix. We have to go with MS SQL Server not just because it's the company standard, but because this is a Very Large database, and so we need 24/7 support, which we only have with SQL Server.

I am still concerned about MS SQL Server handling the job. I am very sold on Pg's locking model (readers don't wait for writers, and writers don't wait for readers) compared with MS SQL's (readers and writers wait for each other). But databases aren't my day job, and our DB guys tell me that it can cope. I guess I'll find out for sure soon enough.

The general scaling of this project is also looking "interesting" for various values of "interesting". We'll have about 3 million records a day going into the database (and a similar number coming out) at launch. We'll be storing 30 days worth, so that's 90m rows. Not unreasonable.

But spam is doubling every 2 months at the moment. So in 13 months that will be 9 billion rows. Pretty much un-queryable at that point. You have to change the architecture. So this will be a fairly short-lived design unless the spam increase slows down a lot.

Anyway, my brane hurts thinking about it. But at least it's an interesting problem to have.


where does it all end?

nicholas on 2003-06-11T10:08:49

But spam is doubling every 2 months at the moment. So in 13 months that will be 9 billion rows. Pretty much un-queryable at that point. You have to change the architecture. So this will be a fairly short-lived design unless the spam increase slows down a lot.

If spam is currently 50% of e-mail (by number of messages), and it doubles every 2 months, then in 1 year's time 98% of e-mail will be spam. I can't see that this is sustainable - if spammers keep on doubling their productivity, e-mail infrastructure is going to give up, and no-one is going to have any e-mail arriving, even if they have filtering that can keep up with the load.

Re:where does it all end?

Matts on 2003-06-11T12:08:18

Well I predict that it simply can't. And I predict that the thing that will stop it getting to that point is strong legislation. Slap a few spammers into prison and you'll see spam suddenly slowing down to processable rates again.

Re:where does it all end?

nicholas on 2003-06-11T13:32:15

Slap a few spammers into prison

In practical terms how do you define an offence that spammers are committing? As far as I can tell, it's only recently that they've started commiting (existing) criminal offences to spam, by using zombies on compramised machines as mail relays. Prior to that, were they only breaching contracts by sending spam in violation of terms of use? Are the spammers physically located in countries that will pass effective anti-spam legislation? And can they be identified given that often spam is relayed via offshore machines?

Re:where does it all end?

Matts on 2003-06-11T15:05:58

Spam is a form of denial of service attack on a network. I don't see the fact that it's advertising changing that fact.

To find a spammer just follow the money. You need a subpeona to do that.