SpamAssassin 3.4.1 released
The "auto whitelist" (AWL) feature of SpamAssassin has long been one of that program's more annoying aspects. In theory it tracks the emails from each sender to get an overall sense of whether they are trustworthy; email from a trusted source will get a bonus score, while messages from apparent spammers will be penalized. The sad truth of the matter, in your editor's experience, is that a spammer need only get a small number of messages through to convince the AWL that everything else should be whitelisted. If SpamAssassin's other scoring mechanisms were perfect, this kind of AWL corruption would not be a problem — but then the AWL would not be needed at all. In a world where scoring is imperfect, the AWL often seems to make things worse.
In 3.4.1, the SpamAssassin developers have tried to address some of the problems with the AWL by replacing it with a new mechanism called TxRep. The basic idea remains the same: track each sender's activity and adjust the score of new messages toward the mean of what has been seen in the past. But a number of useful changes have been made in how this tracking is done, starting with an expansion of the set of data that is used. TxRep maintains reputation scores for the sending email address (as did the AWL), but also the sending domain name, the IP addresses of the originating system and the server that transferred the message, and the "HELO" string used by the last server. For any given message, each of these quantities is mixed in with its own (user-configurable, naturally) weight.
Another useful change is that the sa-learn utility (until now used only with the Bayesian filter) can be used to train TxRep, so the same command now works to update both filters. There is a "dilution" mechanism that causes newer messages to have more influence on a sender's score than older ones, making the system more responsive should, say, a spammer repent and start actually sending useful stuff (or should TxRep initially misjudge a sender). TxRep can be used to whitelist (or blacklist) senders or IP addresses outright — something that might be worth doing automatically for the most obvious of spam or for messages that have been explicitly classified by the recipient. There is also a mechanism to automatically whitelist the recipients of outgoing mail — though that could have undesired effects if one is prone to sending irate responses to spammers.
With these changes, TxRep should be able to avoid some of the worst AWL pitfalls, though the documentation still recommends against turning on auto-learning until SpamAssassin as a whole has been tuned well. But the whole thing still seems to be built around the idea that people can be spammers part of the time and senders of legitimate email at others. Perhaps your editor is an excessively unforgiving character, but it seems like the sender of known spam should not get off lightly with a gradual tweaking of a reputation score; once a spammer, always a spammer. Trust is hard to earn but easy to lose; the TxRep mechanism still doesn't quite reflect that fact.
The PDFInfo module, which has long existed outside of the SpamAssassin mainline, has now been merged; PDFInfo, as its name would suggests, looks for spammy PDF attachments. There is one other new module, URELocalBL, which allows blacklisting of spammy links using a local database.
SpamAssassin 3.4.1 can do a more thorough and careful job of normalizing all messages to the UTF-8 character set before applying rules. That should help to eliminate various tricks using strange character sets to get around the spam-checking rules.
An interesting addition to the Bayesian filter is the ability to hash MIME
attachments and use the result as a filter token. If it works well, it
should allow the filter to recognize often-repeated spam payloads as a
whole. But, as the manual
page notes, "not much experience has yet been gathered regarding
its usefulness
". It seems worth a try, in any case.
Beyond all of this work, of course, is the constant challenge of maintaining the rule base in the face of a changing spammer landscape. Spammers may now be more concerned with getting past Gmail's filters than SpamAssassin, but there are still signs that a subset of spam has been tested against SpamAssassin until the rules are unable to stop it. The Bayesian filter helps with that problem, but so does an ongoing effort to keep those rules current. It is thus unsurprising that a new SpamAssassin release contains a long list of rule changes that should help to keep its effectiveness up — until the spammers work around those as well.
Your editor has often heard the complaint that email is reaching a point of
complete uselessness. Such claims overstate the reality — one need only
watch how email keeps our development communities going to see that. But
email has been under attack for many years, making life harder for both
email users and those who are charged with running email systems. It is
fair to say that SpamAssassin is one of a small set of tools that has
helped email to survive the ongoing spammer onslaught, so it is good to
see this tool continuing to evolve.
| Index entries for this article | |
|---|---|
| Security | Email/Spam prevention |
| Security | Spam |