When looking at the annoyance factor of unwanted mail, bounce messages (caused by some insane worms randomly sending mails with arbitrary from-addresses) seem to have overhauled ordinary spam. The problem with those is that they pass my spam filters and now I have to take steps.
I figure that it should be possible to get an almost flawless detection of those bounces with a specially tailored bayesian filter. Note that I don't want to use the existing bayes filter (as part of SpamAssassin for example). I would first have to train them and also, I suspect that real spam and bounces don't have much in common when looking at the used words.
So what I have started doing now is writing a bayesian filter for bounces. First thing I wrote was a flex-scanner that detects valid RFC822 mail addresses. The scanner gets fed one message. It opens a pipe to another process (the one that does the actual filtering) and writes the mail to this process. The only thing the scanner does is replacing every email-address it can find in the body with T_MAILADDR
or somesuch. When reading RFC822 correctly, the below should be the rules for a valid email-address:
atom [!#$%&'-/0-9A-Za-z_`{}|~^]* dtext [\x00-\x0C\x0E-\x5A\x5E-\x7F]* qtext [\x00-\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]* quoted_pair "\\"[\x00-\x7F] quoted_string "\""({qtext}|{quoted_pair})*"\"" word {atom}|{quoted_string}
domain_literal "["({dtext}|{quoted_pair})*"]" domain_ref {atom} sub_domain {domain_ref}|{domain_literal} domain {sub_domain}("."{sub_domain})* local_part {word}("."{word})*
addr_spec {local_part}"@"{domain}
.procmailrc
before spamassassin is even triggered.
I check the Return-Path
header for <>
. That catches most of the bounces.
Re:Seems Like a Lot of Work
ethan on 2004-05-30T04:16:57
Actually most of my bounces don't have aReturn-Path
header at all. The attached original message usually has one, but it always contains one of my addresses.
Re:from: ne from_
vsergu on 2004-05-29T22:33:58
If you send out all your mail with juerd@c4.example.com as the MAIL FROM, then that address will end up in some Received lines in some messages. Some of those messages will end up on computers that get infected with worms. Some of them will end up somewhere on the web, where the address will be harvested by spammers. So soon you'll be getting bounces to that address in response to spam and worms. I guess that means you'll have to change the address periodically.
Of course there are stupid mail systems out there that send bounces to the "From:" address, but they probably don't send them as <> anyway.Re:from: ne from_
Juerd on 2004-05-31T20:32:22
That's what I thought, but I've been using this for over a year now and I haven't received a single unwanted message at juerd@c4.example.com yet.