spamassassin has been pretty darn good in stopping the megabytes of spam that come into my mailbox each week. But Paul Graham's Plan for Spam is a blueprint on how to evade tools like spamassassin: overload your message with ham instead of spam!
For instance, I've been getting lots of spam that includes a small text part with a longish list of random ham words, followed by an HTML part with a spam message and some more random ham. Case in point:
protactinium midwives gore allegro zoo bargain clarify kahn i's threshold cotoneaster ambiance lighten bentley homeric underling inflammable abase kohlrabi idiomatic trounce leitmotivYup. Most spam that spamassassin catches has nothing to do with a homeric underling, the infamous kohlrabi idiomatic trounce, or the ever-popular incident when midwives gore allegro zoo bargain.
This stuff would be funny if there weren't so damn much of it....
When they run out of short shorts to plagiarize, generators will be written that stitch together semirandom sentences in order to create what appears to be genuine new textual content.
Actually...
educated_foo on 2004-01-09T22:06:17
I've already received at least one spam with a quote from "Through the Looking Glass" appended. As always, they're a step ahead.Re:Actually...
chaoticset on 2004-01-12T15:44:39
Well, yes -- but think of the possibilities for unintentional literature propagation!:D There's always a silver lining...
Re:Just Think...
jmm on 2004-01-09T22:17:02
When they run out of short shorts to plagiarize, generators will be written that stitch together semirandom sentences in order to create what appears to be genuine new textual content.Oh-oh. Quick, Joe, hide the source to MarkovBlogger or we'll be in even more trouble.
Re:Just Think...
barbie on 2004-01-20T12:40:12
Anything more than a quote or short extract and they could be sued for breach of copyright. If publishers were involved, it could be for much larger sums of money than the current fines for spamming certain state residents.I suspect they'll just stick to a MarkovBlogger type word/phrase generator.
Re:Just Think...
chaoticset on 2004-01-20T13:55:29
Even with only a quote, it's not for "review" purposes, so I think it would still be breach of copyright, generally. Wouldn't it?Although that might be interesting -- they might start including canned reviews, or even generated reviews, along with several quotes. It'd be like getting a machine-generated magazine article.
Maybe we need more dimensions. Junk like this should be filtered out based on rules orthogonal to the spamming rules: in how far is it likely that a message contains nothing of interest? You'll get a decision surface, instead of just a line. The score must reside in a small area, in order to be considered OK.
Most importantly: the effect of ruling out messages for lack of content, should not affect the self-learning rules for spam.
Of course, once you get this scheme working, there's no reason to stop at two dimensions, and it's easy to add more.