Measuring the effect of what you do is important. Equally important is knowing what is the measure of your actions.
A question turned up on IRC that had me thinking.
Do you have a percentage of the spam traffic you catch on your MXes? The reason I ask is I lust learned that fastmail.com claim they catch 71% of all incoming spam. Also a rate of false positives would be nice to have, but that's likely harder to measure.
My first impulse was that I would consider a seventy-one percent hit rate on the low side of what we are seeing here at bsdly.net and associated domains.
But getting actually useful data would require some thinking. That said, comparing a major mail operator that sells deliverability and promises a 71 percent catch rate for incoming spam and bsdly.net would be like comparing apples and oranges at best.
While bsdly.net (which is also known under a few other domain names) is my main mail service for my personal use and for a very select number of other people, to the rest of the world it is primarily a honeypot that generates security relevant data that other sites use, and that contributes to IP reputation rankings.
The site has been in operation in those roles for a little more than 15 years, since shortly before the original announcement in the article Hey, spammer! Here's a list for you!. When we started using the greylisting and greytrapping based setup, we saw a sharp drop in undesirable messages actually reaching inboxes, and I observed a marked decrease in load on the mail servers that did the content filtering.
Not long after I had set up our early greylisting setup, a message turned up on the openbsd-misc mailing list that pretty much matched our experience — a 95% reduction in spam in line to be treated to content filtering — so setting up precise measuring became a thing to do when we could get around to it.
Now enough with the background. It is relatively easy to extract at least some data that would give us a rough picture of the relative effectiveness of the greylisting and greytrapping versus the content filtering on receipt. The setup is very similar to the one described in the practically-oriented parts of the Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools and is part of a syncronizing multi-domain setup rougly as described in the earlier article In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe.
Using only tools found in the OpenBSD base system, I went on to collect data.
Whenever spamd(8) closes a connection it logs a message to that effect, so
$ zgrep "Nov 1" /var/log/spamd.6.gz | grep -c disconnected
Supplies the total number of connections closed by spamd(8) during November 1st, fetched from the archived log file.
Similarily
$ zgrep "Nov 1" /var/log/spamd.6.gz | grep -c BLACK
provides the number of connections during the same 24 hour period initiated by hosts that were already in one of the blocklists used.
The command to get the number of connections that had cleared the first hurdle and entered greylisted status would be
$ zgrep "Nov 1" /var/log/spamd.6.gz | grep -c GREY
And the number of hosts that had been well behaved enough to enter the whitelist and be allowed to talk to the real SMTP service comes out of
$ zgrep "Nov 1" /var/log/spamd.6.gz | grep -c whitelisting
For hosts that have reached this far and did not fail the content filtering we do during receipt, we get the number with
$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c Completed
It is however worth noting that our MTA exim reports Completed for apparently message deliveries in both directions, so the number of received messages, or messages that did inbox is likely about thirty percent lower.
The number of messages rejected for one reason or the other, by being addressed to an undeliverable address or by failing content filtering we find with
$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c rejected
And finally, a side effect of a frequently run log reading script that adds hosts with certain kinds of characteristics such as not having a correct reverse DNS entry to a blocklist and kills all their connections will at times produce an unexpected disconnection while reading SMTP command message. We find those with
$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c unexpected
Those are hosts that somehow got past spamd(8) by behaving enough like a real SMTP server to clear greylisting. However spamd(8) does not have the ability to check for valid reverse, so that part is left in our case to check for by reading the log files at intervals.
The following table has the data for November 2022 —
Date | Incoming SMTP connections |
BLACK connections |
GREY connections |
New whitelist entries |
Deliveries | Rejected | Unexpected disconnect |
2022-11-01 | 53303 | 38951 | 2580 | 54 | 1347 | 409 | 384 |
2022-11-02 | 55653 | 40467 | 2174 | 121 | 1297 | 549 | 330 |
2022-11-03 | 59658 | 43901 | 2086 | 85 | 1260 | 865 | 759 |
2022-11-04 | 57462 | 45674 | 1683 | 71 | 1270 | 30 | 0 |
2022-11-05 | 44993 | 43571 | 2146 | 105 | 1182 | 43 | 0 |
2022-11-06 | 36768 | 37802 | 2322 | 86 | 1366 | 184 | 0 |
2022-11-07 | 49464 | 44213 | 2398 | 182 | 1424 | 67 | 0 |
2022-11-08 | 52285 | 45904 | 2676 | 113 | 1513 | 69 | 3 |
2022-11-09 | 47652 | 47988 | 2085 | 105 | 1438 | 154 | 0 |
2022-11-10 | 57850 | 49875 | 2614 | 104 | 1435 | 192 | 2 |
2022-11-11 | 60269 | 56719 | 2355 | 99 | 1420 | 90 | 1 |
2022-11-12 | 46139 | 54073 | 1160 | 96 | 1182 | 29 | 0 |
2022-11-13 | 40497 | 40221 | 1777 | 70 | 1239 | 189 | 0 |
2022-11-14 | 59965 | 59951 | 2062 | 63 | 1382 | 145 | 73 |
2022-11-15 | 56265 | 32727 | 2304 | 113 | 1298 | 351 | 301 |
2022-11-16 | 77252 | 58029 | 1925 | 109 | 1340 | 282 | 33 |
2022-11-17 | 43107 | 30713 | 786 | 131 | 1250 | 215 | 17 |
2022-11-18 | 49448 | 48999 | 1590 | 96 | 1327 | 194 | 1 |
2022-11-19 | 42413 | 45927 | 973 | 92 | 1182 | 182 | 70 |
2022-11-20 | 50890 | 55318 | 1558 | 77 | 1203 | 358 | 33 |
2022-11-21 | 36601 | 35070 | 1707 | 125 | 1321 | 241 | 146 |
2022-11-22 | 37840 | 35499 | 2055 | 99 | 1359 | 142 | 17 |
2022-11-23 | 43186 | 34545 | 1314 | 114 | 1345 | 103 | 21 |
2022-11-24 | 46802 | 45765 | 1856 | 66 | 1269 | 729 | 52 |
2022-11-25 | 70911 | 52404 | 1315 | 89 | 1326 | 1488 | 395 |
2022-11-26 | 39780 | 32226 | 1500 | 77 | 1175 | 954 | 379 |
2022-11-27 | 67578 | 41581 | 1743 | 85 | 1231 | 523 | 315 |
2022-11-28 | 54688 | 37534 | 2433 | 77 | 1337 | 321 | 269 |
2022-11-29 | 70893 | 45917 | 2502 | 65 | 1248 | 87 | 39 |
2022-11-30 | 50280 | 35585 | 2567 | 67 | 1324 | 1293 | 1113 |
The table is also available as a comma separated (CSV) file.
As I mentioned earlier, the number of connections to the outer layer spamd(8) is likely higher than what would be expected on sites that are not considered a honeypot and home to in excess of three hundred thousand imaginary friends (see The Things Spammers Believe - A Tale of 300,000 Imaginary Friends or the trackerless version.
That said, I think the data shows that catching the unwanted traffic early, and discarding as much as possible of that traffic before it reaches the resource hungry content filtering is definitely beneficial.
Even sites that do not actively bait the baddies out there would likely see noticeable energy bill savings by having their mail servers run quiter and cooler, as they definitely will after getting a greylisting, and optionally greytrapping setup in front of them. Those services have a truly low energy consumption profile.
If you found this article interesting, useful or just simply irritating, I would like to hear from you. Please use the comment field, or if you prefer, send email to nix at nxdomain dot no with a subject that at least tries to sound sensible and relevant.
As always, if you are interested in research on items mentioned in this article, I will be able to provide data for study. I will honor reasonable requests.
No comments:
Post a Comment
Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.
I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.
Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.
If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.