Wednesday, July 25, 2007

Harvesting the noise while it's still fresh; SPF found potentially useful

In previous installments of my greylisting and greytrapping posts, I've described how I found that after publishing my traplist, I maintained the list mainly by searching the mail server logs for "Unknown user" messages and by running a tail -f of my spamd log in a terminal window.

It dawned on me a couple of days back that finding the "Unknown user" entries in the mail server logs means I find only the backscatter bounces that have managed to clear greylisting, sent by real mail servers which are misconfigured to deliver spam to their users. Clearing greylisting may take a while, but once the IP address enters the whitelist and the machine does not try to send again to any address which is already in the traplist, it will be able to deliver its spam or backscatter.

Harvest the noise while it's fresh
Fortunately it's very easy to harvest the noise data while it's fresh. You search the greylist instead. A simple

$ sudo spamdb | grep GREY

gives you a list of all currently greylisted entries at that spamd instance, in a format which is well documented in the spamdb man page:

GREY|200.170.143.41|smtp6.netsite.com.br|<tbento@acipatos.org.br>|
<peter@bsdly.net>|1185386752|1185401152|1185401152|1|0
GREY|217.19.208.25|idknet.com|<>|<credulity093@datadok.no>|1185386865
|1185401265|1185401265|1|0
GREY|85.249.128.205|neptune.usedns.com|<>|<credulity093@datadok.no>|
1185387329|1185401729|1185401729|1|0
GREY|194.183.162.193|scelto.relc.com|<>|<bequeathpi@datadok.no>|1185387398|
1185401798|1185401798|1|0


There will more likely be more than one, and in this format it's fairly easy to see at least two traplist candidates, credulity093@datadok.no and bequeathpi@datadok.no. I have no idea if 217.19.208.25, 85.249.128.205 or 194.183.162.193 would ever have cleared greylisting, but now that credulity093@datadok.no and bequeathpi@datadok.no are in my traplist (and yes, at least part of the process should be very easy to automate), they'll be stuttered at, starting with the next time they try to connect and most likely until they give up.

Now it's probably still useful to tail -f your spamd log anyway, but you can leave the harvesting off until you see a marked increase in simultaneous connections to spamd, as in when the first number in parentheses starts rising sharply. Here the number is low (the second number is the number of currently blacklisted hosts):

Jul 25 22:17:16 delilah spamd[11839]: 217.146.97.10: connected (12/12), lists: spamd-greytrap
Jul 25 22:17:35 delilah spamd[11839]: 213.177.120.98: connected (13/13), lists: spamd-greytrap
Jul 25 22:17:36 delilah spamd[11839]: 87.103.238.226: connected (14/14), lists: spamd-greytrap


When the first number rises sharply -- that's when the first wave of spam or backscatter hits, and you can harvest the noise while it's still fresh.

A good harvest means less work for your mail server.

SPF found potentially useful
One recurring theme in greylisting discussions is how to deal with sites which do not play nicely with greylisting, specifically sites with many outgoing SMTP servers and no guarantee that the retries will come from the same IP (you can find a rather informal discussion in the PF tutorial, for example). If you can't get those sites to do the do the retry magic, you probably need to whitelist them, but in the case of large sites like google, how do you find out just which machines to white list?

For well run sites the answer is simple: if they publish SPF data, you use that. After all, that data is their own list of valid outgoing SMTP senders. The solution presented itself in a recent openbsd-misc post by Darrin Chandler. If you need to whitelist a site with many potential outgoing SMTP servers, the command is

$ host -ttxt example.com

That is, look up the text data in the domain's DNS data, which is where SPF data lives.
The answer would typically be something like

example.com descriptive text "v=spf1 mx -all"

which essentially means, "for the example.com domain, only the mail exchangers are valid SMTP senders". The next step is easy: if the answer contained IP addresses or ip address ranges, you put those in your whitelist, in this case a

$ dig example.com mx

would get you the data you need (possibly after a few more host commands).

Frankly it would be a lot better if those sites learned to play well with greylisting, but if you choose to whitelist them anyway, at least this way you take their word for what their valid senders are.

1 comment:

  1. "Frankly it would be a lot better if those sites learned to play well with greylisting" Totally agree.

    Google thoroughly abuses SPF by including hundreds of thousands of ips, which is a total piece of nonsense;-

    $ dig google.com TXT +short | fgrep spf
    "v=spf1 include:_spf.google.com ip4:216.73.93.70/31 ip4:216.73.93.72/31 ~all"

    $ dig _spf.google.com TXT +short | fgrep spf
    "v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

    $ dig _netblocks.google.com TXT +short | fgrep spf
    "v=spf1 ip4:216.239.32.0/19 ip4:64.233.160.0/19 ip4:66.249.80.0/20 ip4:72.14.192.0/18 ip4:209.85.128.0/17 ip4:66.102.0.0/20 ip4:74.125.0.0/16 ip4:64.18.0.0/20 ip4:207.126.144.0/20 ip4:173.194.0.0/16 ~all"

    Are they really claiming that routers, HR PCs & tape silos are valid outgoing mail servers??

    They also try to bully RFC compliant postmasters that implement greylisting to bow to their non Internet standard practices: "we recommend not using greylisting, and instead use SPF [which we abuse to suit ourselves as we're too lazy/stupid to list valid mail servers]"

    https://support.google.com/mail/answer/180063?hl=en

    ReplyDelete

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.