Bayes Training for SpamAssassin

gav on 2004-04-12T02:22:07

I've been too lazy to train SpamAssassin's Bayesian classifier with spams that didn't get marked as spam. After messing with things for a bit instead of doing some real work (like laundry or something; have you ever noticed how much more productive you are when you can put off doing other things?), this is the way I set things up. This maybe helpful to somebody, it's probably going to be helpful to me when I forget how I did it.

Firstly I created a folder called SPAM in Apple Mail. This is where I drag any spams I want to train SpamAssassin with.

Then I spent a while being annoyed at SA for not doing what I wanted it to do. See my mail doesn't go to me, it goes to another user which can't log in. There isn't an obvious way to tell sa-learn that you want it to work for another user. To get around this I set bayes_path to an absolute path in that user's ~/.spamassassin/user_prefs.

Then I wrote this little shell script:

#!/bin/sh
mbox=/Users/gavin/Library/Mail/Mailboxes/SPAM.mbox/mbox
user=
host=
sa_user=
rsync -zv -e ssh $mbox $user@$host:/home/$sa_user/tmp
ssh $user@$host sa-learn \
   -p /home/$sa_user/.spamassassin/user_prefs \
   --showdots --mbox --spam /home/$sa_user/tmp/mbox

This seems neater than setting up a mailbox to receive spams because I don't have to worry that any other headers have sneaked in there.