Rule-based filters

The idea of rules-based filtering is very simple. They use many spam identification tests on the mail headers and body text to identify spam. In this method, the software looks for terms like "SEX!" or "Hair Growth" and then deletes them at the mail server.

This approach's problem is that it's always a step behind the spammers. For instance, we, know that a message with the subject of "F R E E V I A G A R A" is spam. But, a ruled based program might miss it. The rules-based approach is a good one, but keeping the rules accurate and up to the minute is a never-ending job. Another problem is that the better a rules-based program gets, the slower it will run.

Make no mistake about it; trainable rule-based filters are an excellent technique. But they're condemned to always be a step behind. And they come with a built-in, eternally growing performance hit.

Bayesian filters

At first glance Bayesian looks a lot like rules filtering, but instead of starting with preset rules, Bayesian, with a user or administrator's help, learns to tell the difference between spam and good mail. This is expressed in terms of a probability and so after a few hundred messages, a good Bayesian filter will automatically recognize that the odds are seriously against any message with a subject of 'sex' with the HTML coding for bright red is almost certainly spam.

Bayesian, because it's simple to program and highly accurate — success rates of 98% are not uncommon — has become the hottest anti-spam technology.

At the Gateway

There are more than a dozen commercial anti-spam programs. These include: Brightmail, Cloudmark Authority, CipherTrust IronMail, Trend Micro and Tumbleweed. All these companies use several, if not all, of the anti-spam methods to try to build the perfect anti-spam program.

They're all trying but no one is close to perfection yet. You really must obtain evaluation copies and test them with your users and network before you'll be able to make an informed choice.

Many ISPs and companies build their own solutions. Of these, most are built on the foundation of the procmail Unix mail processing utility and SpamAssassin, a powerful Unix-based, open source mail filtering program.

SpamAssassin isn't just for Unix and Linux shops though. There are many versions available including Network Associate's McAfee System Protection SpamKiller for Microsoft Exchange Small Business for Exchange 2000. There are also a variety of other commercial and open source programs based on SpamAssassin that will work in concert with almost any mail server.

None of these anti-spam programs, however, are that fast. Most network administrators find that these programs require their own servers for effective mail throughput. Others administrators use outsourced anti-spam services such as those provided by Postini and MessageLabs.

If you do elect to use your own in-house server, it needs fast connections to both your Internet gateway and the e-mail server. I'd recommend Fast Ethernet at a minimum and, if you have more than 500 user mailboxes, I think gigabit Ethernet for inter-server connections should be seriously considered.

The machines themselves should have ample memory, at least 512MB of RAM, and fast 120GB+ hard drives. System speed, while important, isn't as critical as memory and disk space. That's because when you boil spam-protection down to its basics, it comes down to lots and lots of string comparisons. Such procedures always tend to be processor light but memory intensive. Finally, these machines should have no other jobs except spam-bashing.

If possible, as Ferris recommends, end-users should have direct access to spam messages. You may be sure a given message is spam. The program may be certain that it's spam, but only the user can tell if it really is spam. If the user has to go through a help desk to get at the message, he's not going to be a happy user. Some server programs, like ActiveState's PureMessage, already enable users to get directly at their 'spam' mail.

Does this sound like building server anti-spam protection will either be a lot of trouble or expensive if you outsource it? You're right; it will be one or the other.

Is it worth it? You tell me? Are your users sick of spam? Are you tired of having large chunks of your Internet bandwidth taken up by spam? Are you tired of watching your mail servers' hard drives glow from constant use? If you answer yes to three or more of those questions, it's time to add anti-spam services to your network.