I know there are a lot of spam filters out there, and most of the bayesian ones will work better than this one. Also, there are lots of people out there, that get a lot more spam than I do. But my interest in pattern matching and recognition were reason enough for me to try to code one myself.
Features
- Pure perl
obmf is written in perl for rapid development, the ability to run (almost?) everywhere perl does, great string handling and personal preference. If you don't like perl because you have seen a lot of bad examples, let me assure you that I have taken care of the code being readable, well documented and easy to understand. The one and only perl-module obmf depends on is "DBI" for database-connectivity, which is part of almost every (desktop-)OS.. So the real feature is probably obmf's ease to be customized. Anyone with some basic knowledge of perl should be able to do with it whatever (s)he wants it to do.
Though I have not tried it myself I'm pretty sure obmf will need perl 5.6 or later. - Text-only
obmf ignores non-text parts of the mail, understands multipart messages and saves each mail's message-id so a mail is not examined twice. - Easy interface
Sample configurations for mutt and procmail are also included. Anyone with other systems is welcome to send the config for his/her favorite mail prgram.
Download
Just download one of the following links,
extract the file and read the
readme
.
Usefull links
Interesting papers
- "A Plan For Spam" by Paul Graham. This paper is, AFAIK, the start of it all. Paul describes how he is trying to fight spam using a statistical approach.
- This namelss paper on spam detection by Gary Robinson describes how the algorithms can be modified to match better. It contains lots of links to third party pages which describe several detail aspects of the mathematical formulae used.
- Better Bayesian Filtering is the second essay about this topic by Paul Graham. This aproach is far more complex than the first method (see link above) but might just work..
Other, similar programs
- the Controllable Regex Mutilator "crm114" is a very interesting approach using some sort of mutating regular expressions. It can be used for other things as well; filtering firewall log, for example.
- the digramic bayesian classifier "dbacl" can classify mail (and other texts) in more than one cathegory. Therefore it can be used as semi-intelligent procmail alternative.
- SpamProbe is a very complete implementation of Paul's (see above) idea with a rather long feature list.
- Freshmeat category with lots of spam filters.