SpamAssassin Kills MovableType Comment Spam

cwest on 2004-03-27T05:13:45

In the last month my journal has become more popular. I can declare that with certainty because the clear indicator is a drastic increase in comment spam. Dealing with more than 10 spam comments per day has become far too time consuming and cumbersome using the web interface.

MovableType plugins abound for requiring users to jump through hoops to comment, group based IP ban lists, FOAF whitelists, and moderated commenting. All of these solutions smack of similar solutions for dealing with email spam. So, I thought, why not just use SpamAssassin? Comment spam can't be much different from email spam, right?

I put together a proof of concept and, with a little score tweaking, I now have a fine comment spam solution that doesn't get in the way. SpamAssassin has a couple thousand tests it uses to scan email messages for spam and they work just fine for comments that are converted into email format.

First, I ran the following program in test mode to take a look at how my comments lined up.

#!/usr/bin/perl
 my $TEST_MODE = $ARGV[0] eq 'test' ? 1 : 0;
 my %DB_CONFIG = (
                  dsn => 'dbi:mysql:MT',
                  user => 'XXXX',
                  password => 'XXXX',
                 );
 my %SA_CONFIG = (
                  usercodefs_filename => "$ENV{HOME}/.blog_spam_codefs",
                  debug => 0,
                 );
 my $MESSAGE = <<'__MESSAGE__';
 From: %s <%s>
 Subject: Journal Comment
 To: Casey West <casey@geeknest.com>
 Date: Thu, 25 Mar 2004 13:44:26 -0500 (EST)
 
 %s
 
 %s
 
 %s
 __MESSAGE__
 
 use Mail::SpamAssassin;
 use Class::DBI::Loader;
 use strict;
 use warnings;
 no warnings 'uninitialized'; # Class::DBI::Loader issue
 $| = 1;
 
 my $spam = Mail::SpamAssassin->new( \%SA_CONFIG );
 my $loader = Class::DBI::Loader->new(
                                        %DB_CONFIG,
                                        namespace => 'MT',
                                       );
 my $comments = $loader->find_class('mt_comment')->retrieve_all;
 
 while (my $comment = $comments->next) {
     my @args = map $comment->$, map "comment$_",
                    qw[author email author url text];
     my $email = sprintf $MESSAGE, @args;
     my $status = $spam->check_message_text($email);
 
     if ( $status->get_hits > 0 ) {
         my %attrs = (
                      id => $comment->comment_id,
                      hits => $status->get_hits,
                      is => ( $status->is_spam ? 'YES' : 'NO' ),
                      author => $comment->comment_author,
                      url => $comment->comment_url,
                      tests => $status->get_names_of_tests_hit,
                     );
         print '-'x60, "\n";
         print map "$:\t$attrs{$}\n", sort keys %attrs;
         if ( $attrs{is} eq 'YES' ) {
             $comment->delete unless $TEST_MODE;
             print "KILLED:\t" .
               ($comment->moniker eq 'deleted' ? 'YES' : 'NO') .
                 "\n";
         }
     }
     $status->finish;
 }

To run it in test mode.

shell> mtspamkill test

When doing this I noticed that none of my comments were tagged as spam. Many that I knew were spam -- namely due to sexual content -- had been tagged as PORN_4. Since the default hit count required to be spam is 5.0 I realized that I just needed to alter the score SpamAssassin gives to the PORN_4 test. After a few more test runs I had a preferences file like this.

score PORN_4 5.0
 score ADDR_FREE 5.0
 
 body NJHMA_COM /\d+\.njhma\.com/
 score NJHMA_COM 5.0

Then I ran the program in normal mode to kill the comments with extreme prejudice.

shell> mtspamkill

This hack is for MovableType and using a relational database. It should work fine with MySQL, Postgres, or SQLite. I looked into making it work with DBM and found that -- with a little work -- it is possible.

Someone needs to write Class::DBI::Loader::DBM. It would work with DBI::DBM. I looked into this a bit, too. The hard part is finding the column names for each table so the classes can be built appropriately. After that, YMMV. I got that far, went insane, and realized it didn't matter to me that much.

Once you get comfortable with this killing your comment spam, throw it in cron and forget about it. It should do a good job for you.

Posted from caseywest.com, comment here.