I recently retrained my spam filters (sa-learn) by the simple expedient of pulling down about three weeks worth of mail and hand sorting it. It came out something like this:
Ham: 960 messages ~6 megs
Spam: 4100 messages ~120 megs
Out of the ham there was about 50 I actually kept to read. The rest was mailing list threads I wasn't particularly interested in.
Realize this is *after* my mail is filtered through pobox.com's RBL-based filtering. That knocks out between 5000 to 10,000 messages a month (6700 in the last 30 days).
I always assume I get an uncommonly large amount of spam. My address is over six years old and is posted all over the Internet via mailing lists and Perl documentation. This is why I usually scoff when someone suggests "just use foo.com's built-in filtering" whenever my email toolchain has a hiccup.
Do other folks see this much crap?
PS If you ever need to hand filter a large amount of email, sorting by subject helps a lot.
Clever
Meanwhile, when do you plan on reading (and answering) those 50 messages? (I sent you an email, but I don't know if it got through...)
Re:When will you answer your email?
Dom2 on 2005-01-26T15:33:52
mutt is exceptionally good at this sort of thing, thanks to its mini-language for retrieving mail information. In combination with the limit command the Delete command, it rocks for killing unwanted messages. And it's damned fast to boot.-Dom
Re:When will you answer your email?
schwern on 2005-01-26T19:45:26
Its possible it got dropped on the floor while I was fixing things up. Resend to be sure.Re:When will you answer your email?
cog on 2005-01-26T20:29:51
Sent:-)
Do other folks see this much crap?
Yep. Since January 1st I've got about 12,000 emails in my caughtspam folder. I really need to implement some kind of SMTP-time blocking mechanism.
I don't think 960 ham messages in three weeks is a common figure. I probably get a fifth of that.
Going by your numbers, though, you have a spam:ham ratio of 10:1, and I'd say that yes, that's pretty common. It's certainly close to what I'm getting.
Re:
vsergu on 2005-01-26T21:46:21
You must not subscribe to many lists. Schwern's number is only 45 ham messages a day, so you're only getting 9? I probably get 2000 ham messages in 3 weeks, maybe more, most of it mailing list traffic and some automated messages. Of course the vast majority of that gets filtered into folders and deleted with only a glance at the subject line. I doubt that 960 is out of the ordinary for geeks.Re:
Aristotle on 2005-01-26T23:03:04
I lost my perspective here, put as 45mails/day it's not that much. I don't have much email traffic, but I do have a huge blogroll. Newsfeeds have the advantage of being a spam-free medium so far.
I fear that, when I go on holiday for a few weeks, my normal mailbox will just overflow. The sheer bulk of it is just too much for even the multi-megabytes mailbox I have at my disposal.