Friday, December 23, 2022

Can Your Spam-eater Manage to Catch Seventy-one Percent Like This Other Service?

Measuring the effect of what you do is important. Equally important is knowing what is the measure of your actions.

A question turned up on IRC that had me thinking.

Do you have a percentage of the spam traffic you catch on your MXes? The reason I ask is I lust learned that fastmail.com claim they catch 71% of all incoming spam. Also a rate of false positives would be nice to have, but that's likely harder to measure.

My first impulse was that I would consider a seventy-one percent hit rate on the low side of what we are seeing here at bsdly.net and associated domains.

But getting actually useful data would require some thinking. That said, comparing a major mail operator that sells deliverability and promises a 71 percent catch rate for incoming spam and bsdly.net would be like comparing apples and oranges at best. 

While bsdly.net (which is also known under a few other domain names) is my main mail service for my personal use and for a very select number of other people, to the rest of the world it is primarily a honeypot that generates security relevant data that other sites use, and that contributes to IP reputation rankings.

The site has been in operation in those roles for a little more than 15 years, since shortly before the original announcement in the article Hey, spammer! Here's a list for you!. When we started using the greylisting and greytrapping based setup, we saw a sharp drop in undesirable messages actually reaching inboxes, and I observed a marked decrease in load on the mail servers that did the content filtering.

Not long after I had set up our early greylisting setup, a message turned up on the openbsd-misc mailing list that pretty much matched our experience — a 95% reduction in spam in line to be treated to content filtering — so setting up precise measuring became a thing to do when we could get around to it.

Now enough with the background. It is relatively easy to extract at least some data that would give us a rough picture of the relative effectiveness of the greylisting and greytrapping versus the content filtering on receipt. The setup is very similar to the one described in the practically-oriented parts of the Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools and is part of a syncronizing multi-domain setup rougly as described in the earlier article In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe.

Using only tools found in the OpenBSD base system, I went on to collect data.

Whenever spamd(8) closes a connection it logs a message to that effect, so

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c disconnected

Supplies the total number of connections closed by spamd(8) during November 1st, fetched from the archived log file.

Similarily

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c BLACK

provides the number of connections during the same 24 hour period initiated by hosts that were already in one of the blocklists used.

The command to get the number of connections that had cleared the first hurdle and entered greylisted status would be

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c GREY

And the number of hosts that had been well behaved enough to enter the whitelist and be allowed to talk to the real SMTP service comes out of

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c whitelisting

For hosts that have reached this far and did not fail the content filtering we do during receipt, we get the number with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c Completed

It is however worth noting that our MTA exim reports Completed for apparently message deliveries in both directions, so the number of received messages, or messages that did inbox is likely about thirty percent lower.

The number of messages rejected for one reason or the other, by being addressed to an undeliverable address or by failing content filtering we find with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c rejected

And finally, a side effect of a frequently run log reading script that adds hosts with certain kinds of characteristics such as not having a correct reverse DNS entry to a blocklist and kills all their connections will at times produce an unexpected disconnection while reading SMTP command message. We find those with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c unexpected

Those are hosts that somehow got past spamd(8) by behaving enough like a real SMTP server to clear greylisting. However spamd(8) does not have the ability to check for valid reverse, so that part is left in our case to check for by reading the log files at intervals.

The following table has the data for November 2022 —

Date Incoming SMTP
connections
BLACK
connections
GREY
connections
New whitelist
entries
Deliveries Rejected Unexpected
disconnect
2022-11-01 53303 38951 2580 54 1347 409 384
2022-11-02 55653 40467 2174 121 1297 549 330
2022-11-03 59658 43901 2086 85 1260 865 759
2022-11-04 57462 45674 1683 71 1270 30 0
2022-11-05 44993 43571 2146 105 1182 43 0
2022-11-06 36768 37802 2322 86 1366 184 0
2022-11-07 49464 44213 2398 182 1424 67 0
2022-11-08 52285 45904 2676 113 1513 69 3
2022-11-09 47652 47988 2085 105 1438 154 0
2022-11-10 57850 49875 2614 104 1435 192 2
2022-11-11 60269 56719 2355 99 1420 90 1
2022-11-12 46139 54073 1160 96 1182 29 0
2022-11-13 40497 40221 1777 70 1239 189 0
2022-11-14 59965 59951 2062 63 1382 145 73
2022-11-15 56265 32727 2304 113 1298 351 301
2022-11-16 77252 58029 1925 109 1340 282 33
2022-11-17 43107 30713 786 131 1250 215 17
2022-11-18 49448 48999 1590 96 1327 194 1
2022-11-19 42413 45927 973 92 1182 182 70
2022-11-20 50890 55318 1558 77 1203 358 33
2022-11-21 36601 35070 1707 125 1321 241 146
2022-11-22 37840 35499 2055 99 1359 142 17
2022-11-23 43186 34545 1314 114 1345 103 21
2022-11-24 46802 45765 1856 66 1269 729 52
2022-11-25 70911 52404 1315 89 1326 1488 395
2022-11-26 39780 32226 1500 77 1175 954 379
2022-11-27 67578 41581 1743 85 1231 523 315
2022-11-28 54688 37534 2433 77 1337 321 269
2022-11-29 70893 45917 2502 65 1248 87 39
2022-11-30 50280 35585 2567 67 1324 1293 1113

The table is also available as a comma separated (CSV) file.

As I mentioned earlier, the number of connections to the outer layer spamd(8) is likely higher than what would be expected on sites that are not considered a honeypot and home to in excess of three hundred thousand imaginary friends (see The Things Spammers Believe - A Tale of 300,000 Imaginary Friends or the trackerless version.

That said, I think the data shows that catching the unwanted traffic early, and discarding as much as possible of that traffic before it reaches the resource hungry content filtering is definitely beneficial. 

Even sites that do not actively bait the baddies out there would likely see noticeable energy bill savings by having their mail servers run quiter and cooler, as they definitely will after getting a greylisting, and optionally greytrapping setup in front of them. Those services have a truly low energy consumption profile.

If you found this article interesting, useful or just simply irritating, I would like to hear from you. Please use the comment field, or if you prefer, send email to nix at nxdomain dot no with a subject that at least tries to sound sensible and relevant.

As always, if you are interested in research on items mentioned in this article, I will be able to provide data for study. I will honor reasonable requests.


No comments:

Post a Comment

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.