Bayesian Spam Filtering

Matts on 2002-09-04T12:50:51

Is anyone else sick of all these new bayesian spam filtering projects popping up everywhere now that Paul Graham has translated the theses on it into plain english? I've now counted at least 8, and I haven't gone out searching yet.

OK, maybe I'm just annoyed since I had to do my own translation before that article, and now everyone else has just jumped on the bandwagon given PG's code, without really understanding the underpinnings of it.

Bah humbug.


Me! Me!

gnat on 2002-09-04T18:56:47

I'm sick of it, too, for exactly the same reasons. BN is cool now the way Neural Networks were a while ago. And people are blindly using them the same way they blindly used NNs--copying code and not knowing the limits of the technology, how to improve it, or when it's fundamentally inappropriate to be used.

Bah! Humbug!

--Nat

Re:Me! Me!

petdance on 2002-09-04T19:54:51

So what should they be doing instead? Not producing code? What's wrong with experimentation and seeing what works?

And people are blindly using them the same way they blindly used NNs--copying code and not knowing the limits of the technology, how to improve it, or when it's fundamentally inappropriate to be used.

I don't see how that's harmful, really.

Re:Me! Me!

Matts on 2002-09-04T21:26:45

My point was that nobody is experimenting or innovating here. It's the same thing the open source community does over and over again - sees something cool and develops fifteen billion copies of the same code. Eventually you end up with a concensus on a libbayesianspam, but it takes forever to get to that point, and meanwhile there's no innovation, just mindless cargo-cult copying of code.

I'd rather see people try and think of new ways to stop spam, instead of copying code just to have a new and popular project.

Re:Me! Me!

petdance on 2002-09-04T21:45:42

I'd rather see people try and think of new ways to stop spam, instead of copying code just to have a new and popular project.

I'm not sure you can attribute their motivations to wanting to have a "new and popular project". That's a lot of mind-reading skill that I don't have.

But yes, projects pop up and die off, and the strongest survive. The survivors usually slurp up the castoffs, too, or those with less expertise hook up with someone else. In a couple of months, those who are just in it for the short-term win of having a project will lose interest.

Re:Me! Me!

gnat on 2002-09-04T21:43:34

It's not experimentation, though. You've got people reimplementing existing code. Big whoop. If you don't really know what you're doing, you don't know how to make meaningful changes, and you're not really experimenting, you're just fucking around. And fucking around is okay, but take your hand off your cock and don't pretend it's Your Great Spam Solution when it's really Paul Graham's Great Spam Solution Reimplemented In Your Favourite Language With Your Name In The Comments.

That's what really chafes my ass, the idea that you are a l33t h4ck3r by using some cool algorithm that you have no actual understanding of. I'll bet you that 9/10 of these people can't even tell you what Bayes' Law is, let alone how their code really works. And it's not even that I'm an expert pissed off that these people are sullying my ground--I've read one book and am happy to admit that I don't know shit. But I do realize how damn hard it really is.

I refer you to my earlier statement: bah, humbug :-)

--Nat

Re:Me! Me!

petdance on 2002-09-04T21:50:12

don't pretend it's Your Great Spam Solution when it's really Paul Graham's Great Spam Solution Reimplemented In Your Favourite Language With Your Name In The Comments.

Are people actually trying to take credit without crediting Paul Graham?

And do you not see the value in "Reimplementing In Your Favorite Language"? That seems like a step forward to me...

I guess my point is that I'd rather that a dozen people do something, even if it's only incremental benefit, then a dozen people do nothing.

Re:Me! Me!

mosch on 2002-10-16T02:20:13

Oddly enough, not everybody who is implementing Bayesian spam filtering is doing it without an understanding of Bayes law. Surprisingly, quite a few programmers have strong math backgrounds.

Some of us just saw the idea while in the middle of a web-mail project and decided to include Bayesian filtering as a feature.