Spamassassin and ressources

ethan on 2003-05-22T06:51:06

Woaw! This morning I was virtually unable to boot into my system. Obviously this night more emails were sent to me than Spamassassin was able to deal with. When finally 'top' came up, I saw about 30 or so 'spamd' processes and it took about 15 minutes till all mails were processed (but I had to use SysRq to reboot in between because of a lock-up and the second attempt ended with X and mysqld being killed because no memory was left).

Then the bright idea occured to me to rearrange my .procmailrc. Initially it read:

:0fw | spamc :0: * ^X-Spam-Flag: YES IN.spam :0 * ^Delivered-to: mailing list perl5-porters@perl.org p5p :0 * ^Delivered-to: mailing list beginners@perl.org beginners # all other mail-sorting rules followed



By moving some of the mailings-lists that produce a lot of traffic (p5porters, beginners@perl.org etc.) in front of 'spamd', I now hope that those mails are no longer processed by Spamassassin. At least the porters-list is guaranteed to be spam-free. Not sure about beginners@perl.org though.

I think that Spamassassin needs an in-built ressource-limiter like: never spawn more than 30 processed, never use more than 50% of available memory and CPU altogether. I really don't care if I have to wait some time for all my mails to be checked and sorted into the respective mailboxes in the morning. However, what I care about is if my system stands still for 15 minutes and starts killing processes.

Anyway, Nicholas' suggestion to use Spamassassin as a real-world replacement for perlbench is a very good idea!


spambench

nicholas on 2003-05-22T09:06:34

Nicholas' suggestion to use Spamassassin

Link! Link! My ego demands all the publicity it can get.

The suggestion is in this message to perl5-porters. It's a reply to a question from Alan Burlison asking about good perl performance benchmarks.

Re:spambench

ethan on 2003-05-22T13:11:30

Sorry! :-) Article dutyfully corrected.

running spamassassin on a slow box

merijnb on 2003-05-22T11:04:19

Yeah, spamassassin should be the last thing in your procmailrc, after all the lists are filtered out.

The other thing that helps is if you mark your local mail delivery as expensive. At least with sendmail you can do this, don't know about others. The effect is that only one mail at a time will be delivered. Doesn't choke your box so much. Means I still don't have to upgrade my 16Mb 486 linux box. :-)

Shouldn't happen

Matts on 2003-05-22T11:40:03

If it's happening it's a bug. Find out what mails caused it to slow down and submit a bug report and attach the mails in question.

And turn off network checks. They can be very slow.

And make sure you're using the latest release. There are often a number of timeout (network check related) fixes in each release. And we test the other rules extensively for performance issues.

SpamAssassin is slow, but it really shouldn't kill your box.

Re:Shouldn't happen

ethan on 2003-05-22T13:17:23

If it's happening it's a bug. Find out what mails caused it to slow down and submit a bug report and attach the mails in question.

Oh, that didn't even occur to me. I'll try to watch spamassassin more carefully and see whether particularly slow runs can be pinned down on certain messages. This morning I had the impression that it was simply due to the fact that about 100 messages were to be processed (in 'top' one spamd process was quickly replaced by a new one; so each single run appeared to be pretty fast).

And turn off network checks. They can be very slow.

I'd really love to keep those turned on because they catch a lot of spam. With the tweak of my procmailrc I might have cleared the most obvious bottle-neck.

Btw: I am using 2.54...2.55 is more recent but it allegedly has only a few cosmetic changes targeted towards (mainly) older perls.

Re:Shouldn't happen

Matts on 2003-05-22T15:18:53

Another thing to try is leave on the network checks that work well, and turn the rest off (set their score to zero).

I'd list the ones I think you should leave on, but I'm lazy :)

Re:Shouldn't happen

Elian on 2003-05-22T14:43:05

What kills SpamAssassin dead, or at least cripples my box, is when Vipul's Razor is off-line for some reason. That's the thing that makes the spamd processes stack up really badly. Not SpamAssassin's fault, definitely, though it does (or did) have issues not really timing out the way it should. (Which may well have been an issue with my box, for all I know)

Procmail filtering the known-good (or known-bad--support@microsoft.com and big@boss.com) stuff so SpamAssassin has less to deal with is always a good thing too, as others have already pointed out.

Re:Shouldn't happen

Matts on 2003-05-22T15:20:17

Ugh. De-install Razor. It bites.

Re:Shouldn't happen

Elian on 2003-05-22T15:35:50

I dunno--it seems to catch some of the stuff that'd otherwise make it through. I do admit I worry some about off-site and potentially un-checked content grading systems, but... I'm definitely up for alternatives, but I'm not sure pyzor's any better.

Re:Shouldn't happen

Matts on 2003-05-22T21:36:32

Try DCC instead. It's a better implementation and overall a better idea.

ORBS Down

Herkemer on 2003-05-22T15:46:11

Might have been the 30sec timeout waiting for a response from the ORBS list.

Set:
score RCVD_IN_ORBS 0
in your local.cf file, word is it's dead and probably dead for good.

Processing went from 30+ secs to 1-2secs after I did this.

Re:ORBS Down

ethan on 2003-05-22T17:12:21

That is worth an attempt. I'll see how spamassassin behaves now that I have a lot of my mails not passed through it. If it is still sluggish, I'll start weeding out some of the tests...RCVD_IN_ORBS looks like a good candidate.

Btw: my university also has spamassassin installed so right now my mails effectively get checked twice. Today I learnt that they are increasing some ressources. An official said that currently about 50% of all mails passed through their servers is spam and with the current setup it's hard to handle all of them. But I guess if a university with 30.000 students can run spamassassin globally, I should be able to tweak mine so that it can cope with perhaps 200 messages a day.