Stopping Spam at the Gateway: Page 2
The idea of rules-based filtering is very simple. These filters use many spam identification tests on the mail headers and body text to identify spam. In this method, the software looks for terms like "SEX" or "Hair Growth" and then deletes them at the mail server.
The problem to this approach is that it's always a step or two behind the spammers. For instance, we know that a message with the subject of "F R E E V I A G A R A" is spam, but a ruled-based program might miss it because of the spaces between the letters. The rules-based approach is a good one, but keeping the rules accurate and up-to-the-minute is a never-ending job. Another problem is that the more comprehensive a rules-based program gets, the slower it will run.
Make no mistake about it, trainable rule-based filters are an excellent technique. But they're condemned to always be at least one step behind, and they come with a built-in, eternally growing performance hit.
At first glance Bayesian filtering appears to be a lot like rules filtering, but instead of starting with preset rules, Bayesian filters, with a user's or administrator's help, learn to tell the difference between spam and good mail. This is expressed in terms of a probability, and so after a few hundred messages, a good Bayesian filter will automatically recognize that the odds are seriously against any message with a subject of 'sex' with the HTML coding for bright red being anything other than spam.
Because they're simple to program and highly accurate – success rates of 98% are not uncommon – Bayesian filters have become the hottest anti-spam technology.
At the Gateway
There are more than a dozen commercial anti-spam programs available, including Brightmail, Cloudmark Authority, CipherTrust IronMail, Trend Micro, and Tumbleweed. All these companies use several, if not all, of the anti-spam methods identified above to try to build the perfect anti-spam program.
Still, while they're all trying to get there, none of the tools is anywhere close to acheiving perfection yet. As a result, it's important to obtain evaluation copies and first test them with your users and on your network before being able to make an informed choice.
Many ISPs and companies build their own solutions. Of these, most are built on the foundation of the procmail Unix mail processing utility and SpamAssassin, a powerful Unix-based, open source mail filtering program.
SpamAssassin isn't just for Unix and Linux shops, though. There are many versions available, including Network Associate's McAfee System Protection SpamKiller for Microsoft Exchange Small Business for Exchange 2000. There are also a variety of other commercial and open source programs based on SpamAssassin that will work in concert with almost any mail server.
None of these anti-spam programs, however, is that fast. Most network administrators find that these programs require their own servers for effective mail throughput. Other administrators use outsourced anti-spam services such as those provided by Postini and MessageLabs.
If you do elect to use your own in-house server, it needs fast connections to both your Internet gateway and the e-mail server. I'd recommend Fast Ethernet at a minimum, and if you have more than 500 user mailboxes, gigabit Ethernet for inter-server connections should be seriously considered.
The machines themselves should have ample memory and storage capacity — at least 512MB of RAM and fast 120GB+ hard drives. System speed, while important, isn't as critical as memory and disk space. That's because when you boil spam-protection down to its basics, it comes down to lots and lots of string comparisons. Such procedures always tend to be processor light but memory intensive. Finally, these machines should have no other jobs except spam-bashing.
If possible, as Ferris recommends, end users should have direct access to spam messages. You may be sure a given message is spam, and the anti-spam tool may be certain it's spam, but only the user can tell if it really is spam. If the user has to go through a help desk to get at the message, he's not going to be a happy user. Some server programs, like ActiveState's PureMessage, already enable users to get directly at their 'spam' mail.
Does this sound like building in-house server anti-spam protection will be a lot of trouble or will be quite expensive if you outsource it? You're right — it will very likely be one or the other.
Is it worth it? You tell me? Are your users sick of spam? Are you tired of having large chunks of your Internet bandwidth taken up by spam? Are you tired of watching your mail servers' hard drives glow from constant use? If your answer is yes to two or more of those questions, it's time to add anti-spam services to your network.
Story courtesy of EITPlanet