more spam improvements

Over the last couple of weeks I have made the following improvements in spam checking for mail handling on tarragon. Tarragon handles mail for about 20 domains, although only about a dozen have any mail to speak of.

I used to have entries in the amavis whitelist file, but this is/was a weakness. It is easy to fake sender addresses. Use of the amavis sendermaps feature is preferable as that way one can give a spamassassin bump to a known address or domain, but the value of the bump can be small enough not to overcome other attributes of the message. So egregious spam that claims to come from my own domain will still be caught. Also, I can have sendermaps for each separate email domain, instead of a whitelist applying to everyone. The file /etc/amavis/conf.d/56-sendermaps now has all the sendermaps.

A second improvement was to enable spamassassin thresholds separately for each email domain. In the file /etc/amavis/conf.d/52-spamchecks, in addition to the global spamassassin values for when to mark spam, when to reject, etc. there are now tables indexed by recipient which allow setting different thresholds for different recipients. I have used this to tighten down the settings for my own domain without running the risk of false positives for others.

A third improvement was to quarantine spam rather than discarding it, at least for spam scores below a certain cutoff level. This provides a couple of benefits. First, if I do screen out something that is wanted, it can be recovered. Second, I can do a periodic review of messages that were rejected with the recipients. I did the first round of that today. I captured information about all the mail quarantined in the last month, separated by recipient, and sent each person a list of the from addresses for review. This uncovered a few senders that needed to be added to the sendermaps.

I did this, basically by just grepping the mail log for the string ‘{DiscardedInbound,Quarantined}’ and dividing it up by destination domain, and then capturing the relevant bits of the message (the date the stored spam name, and the sender) with:

awk '{print $1 " " $2 " " $16 " from " $12}'

The results are sent to the recipient to check over.

There were a small number of emails which the intended recipient asked to be recovered. It turns out that the simple way to do that is with amavis itself.

I’ve told amavis to store the quarantined mail in /mail/quarantine. When he rejects spam the log entry is like this:

Mar 30 11:16:24 tarragon amavis[2965]: (02965-05) Blocked SPAM {DiscardedInbound,Quarantined}, [12.130.136.195]:41628 [12.130.136.195] <spammysender@spamsource.com> -> <recip@domain.com>, quarantine: E/spam-EP-Jl2yDSezM.gz, Queue-ID: B1B59202F7, Message-ID: <0.1.20B.A29.1D72590CA6D4FE4.0@spammy.com>, mail_id: EP-Jl2yDSezM, Hits: 3.204, size: 93142, 710 ms

The name of the spam message here is E/spam-EP-J12yDSezM.gz, which is indexed by first letter, so that it is stored in: /mail/quarantine/E/spam-EP-J12yDSezM.gz.

There is a command amavisd_release, used (by root) as follows:

amavisd_release E/spam-EP-J12yDSezM.gz

This causes amavis to turn the mail over to postfix for delivery.

September, 2022: Some time ago I began to check the quarantines every day in a script, and send myself an email showing all the new quarantines in the last 24 hours (or an email saying there were none). I noticed that I hadn’t seen those emails in a little while, and went to find out why. Turns out that recently this email – the one which summarizes what quarantines had occurred in the last 24 hours, was ITSELF getting marked as spam. I suppose that there was something in the address of one of the quarantined emails which was triggering spamassassin.

As a workaround I have told amavis to give a 20 point bonus to mail from root@tarragon.wmbuck.net. This is probably something I could analyze to death… is there some way this could be dangerous?