Clever spammer

godoy on 2003-02-04T14:21:35

I got an interesting spam: it was PGP signed.

Yep, spammers are finding out that software such as spamassassin give negative points (no spam) to such messages.

Of course, the checking of the PGP signature gave me an invalid result, but SpamAssassin don't know that.

The spam was caught by bogofilter, a bayesian filter.

And here we go! Several levels of anti-spam. There's been a tiny amount of false positives with bogofilter. I pratically miss one or two spams a day, in a universe of almost 200 spams.

This is a better ratio than when I used spamassassin alone. (SpamAssassin is deactivated now, so that I can see how effective bogofilter is by itself...)

BTW, I haven't tried the bayesian feature of spamassassin... It should give a similar result to bogofilter.

I'm really thinking on how to implement such tools at clients of mine in a way that they won't poison themselves and start missing messages.

bogofilter

inkdroid on 2003-02-04T15:41:56

Yeah, I think this highlights why bayesian analysis is better than rule based approaches. It's harder to tweak the message so it slips by. Also, it's my understanding that bogofilter can scaled much better on a high volume MTA than SpamAssasin...but I don't know this for a fact so perhaps I should keep quiet.

Re:bogofilter

Matts on 2003-02-04T17:33:19
No it's true. SpamAssassin's rules are slow.

On the flip side, bogofilter is a personal filter, so it's not going to perform that well on larger installations, which kind of breaks the point of being so much faster, doesn't it?

Re:bogofilter

inkdroid on 2003-02-04T17:49:24
bogofilter can be run from procmail or invoked by your mta just like spamassasin. It just needs a nice corpus of spam/non-spam. What do you mean by being a "personal filter"?

Re:bogofilter

Matts on 2003-02-04T18:03:45
Bayesian filters are personalised. They learn your ham/spam types. If you add in John-From-Marketing's ham then it massively decreases the effectiveness of the filter (because of the lists he subscribes to) for you. They don't stop working, but they're not as effective as SpamAssassin at large installations.

See my talk on this at the spam conference.

Re:bogofilter

inkdroid on 2003-02-04T18:26:21
URL?

Re:bogofilter

Matts on 2003-02-04T22:14:51
http://axkit.org/docs/presentations/spam/

Re:bogofilter

chromatic on 2003-02-04T18:04:00

Being a Bayesian filter, it assumes there's a clear line between spam and non-spam. The more people you put in the same filter, the fuzzier that line may get. It's attempting to emulate your personal in-your-head filter.

Re:bogofilter

inkdroid on 2003-02-04T18:25:59
Yeah, but everyone gets their own.

Re:bogofilter

chromatic on 2003-02-05T01:31:57

Hence the phrase "personal filter". :)

Re:bogofilter

godoy on 2003-02-05T00:06:04
Speed is important when you're dealing with lots of messages.

BTW, it is also important to reduce server load if you try implementing some automated way to handle spam.

I've been thinking about writing something that will be run from an alias such as $USER-spam and $USER-ham (it will have to check the origin of the message: only the user himself can send messages to these addresses) and will classify the message as one of those using the user's database. Then, procmail or some other thing can compare messages against that database and add a header. The user will have to create the filtering rules on his MUA.

It will help a lot and the user will have some way to interact with the program using only email.

Re:bogofilter

Matts on 2003-02-05T07:57:57
With SpamAssassin 2.50 (nearly ready) you can have per-user bayesian databases. But that doesn't scale to a company with (say) 20,000 users. You can't expect everyone to train their systems.

invalid signatures

koschei on 2003-02-04T22:06:57

So, obviously time to have signatures automatically checked, and rejected if invalid. =)

[which has its own problems]

Black List/White List

Ovid on 2003-02-04T22:31:22

Eventually, I suspect that most email software will implement blacklist/whitelist technologies. If a domain/user is on your blacklist, they get discarded. Period. If they are on your whitelist, they get accepted. If they are on neither, any email received will receive an "auto-reply" saying "please respond with an email message requesting that you be added to my white list". Spammers will thus be forced to respond to millions of "whitelist" requests. (the auto response could also simply be informing the sender that they must wait while you evaluate whether or not to put them on the whitelist.)

You, on the other hand, as the manager of your whitelist, will get a chance to approve all email messages for being on your whitelist. If someone doesn't respond or if you choose not to add them, they automatically get moved to the blacklist. Tweaking of exactly how you wish your system to behave will ensure maximum performance and will shut down spam. Yes, it's a lot more individual work up front, but the long term payoff will be that sending spam will no longer be a profit-making industry.

Now, who volunteers to write a Perl/TK based blacklist/whitelist email client? :)

Re:Black List/White List

godoy on 2003-02-05T00:00:55
There are already programs like that.

They demand the answer you talked about. It will, certainly, remove some spammers that use invalid addresses.

On the other hand, the easiness is not that great. See: I'm subscribed to several mailing lists; I see a post from John Doe asking something silly about XYZ's software; I'm in a good mood (this is very important to answer a silly question); I send John Doe an answer and his mail server demands another message from me. I'm sorry, John, I'm not in a good mood anymore... I got the laundry bill right after sending the first message. :-)

Got it? It will reduce the amount of legitimate messages you'll get.

Besides, if the program demands that you review messages to decide how they should be classified, then you'll have the same problem as I do with the inconvenient that you'll have to check the same message 50 times just because it came from 50 different addresses.

I still think bayesian filters are doing "the right thing(tm)".

Re:Black List/White List

Ovid on 2003-02-05T01:13:26

I agree that Bayesian filters are a great way to go, but I still think that blacklist/whitelist systems can work.

There are a couple of potential ways to get around the issue you mentioned. If you have a mailing list on your whitelist, than any email that is sent to or CC'd to the mailing list could make it past a whitelist. Of course, that also requires that whoever manages the mailing list take the time to manage the spam. I'm only on "members only" lists and that takes care of the problem quite nicely.

Also, if you have a reply to an email that you've sent, perhaps you could configure your whitelist filter to auto-accept anything reply sent within X amount of time. That could be too soon for the spammers to harvest, but potentially still allow relatively easy management of email.

So, John Doe sends an email to a mailing list and you respond. His whitelist filter detects your response as being a reply to his email that made it in within the time limit and John automatically gets the email without you getting the "challenge" email. Alternatively, you send your response to the mailing list and John still gets your response.

The only problem I see with that scenario is spammers copying a mailing list with every email and having that mailing list slip past your filters. However, at that point, with the spammer sending millions of emails, how can he or she possibly know which mailing lists go past your filters? The amount of work necessary to figure out how to get past the filters would be ridiculous.

One problem with Bayesian filters is the "conversational style" spam which is becoming more frequent. As that is often difficult for a Bayesian filter to detect, the whitelist becomes perfect for it. Not that I'm knocking Bayesian filters, mind you. It's an awesome tool :)

Re:Black List/White List

godoy on 2003-02-05T01:30:51

The only problem I see with that scenario is spammers copying a mailing list with every email and having that mailing list slip past your filters. However, at that point, with the spammer sending millions of emails, how can he or she possibly know which mailing lists go past your filters? The amount of work necessary to figure out how to get past the filters would be ridiculous.

I receive a lot of computer related spam. I suppose they can use messages from the same place where they harvested my email or something like that.

One problem with Bayesian filters is the "conversational style" spam which is becoming more frequent. As that is often difficult for a Bayesian filter to detect, the whitelist becomes perfect for it. Not that I'm knocking Bayesian filters, mind you. It's an awesome tool :)

Please, don't take me as a bayesian zealot! I'm not. :-) I'm just impressed with how effective it is. In fact, I think that a combination of some methods is the most effective solution.