That grumpy BSD guy: Can Your Spam-eater Manage to Catch Seventy-one Percent Like This Other Service?

Measuring the effect of what you do is important. Equally important is knowing what is the measure of your actions.

A question turned up on IRC that had me thinking.

Do you have a percentage of the spam traffic you catch on your MXes? The reason I ask is I lust learned that fastmail.com claim they catch 71% of all incoming spam. Also a rate of false positives would be nice to have, but that's likely harder to measure.

My first impulse was that I would consider a seventy-one percent hit rate on the low side of what we are seeing here at bsdly.net and associated domains.

But getting actually useful data would require some thinking. That said, comparing a major mail operator that sells deliverability and promises a 71 percent catch rate for incoming spam and bsdly.net would be like comparing apples and oranges at best.

While bsdly.net (which is also known under a few other domain names) is my main mail service for my personal use and for a very select number of other people, to the rest of the world it is primarily a honeypot that generates security relevant data that other sites use, and that contributes to IP reputation rankings.

The site has been in operation in those roles for a little more than 15 years, since shortly before the original announcement in the article Hey, spammer! Here's a list for you!. When we started using the greylisting and greytrapping based setup, we saw a sharp drop in undesirable messages actually reaching inboxes, and I observed a marked decrease in load on the mail servers that did the content filtering.

Not long after I had set up our early greylisting setup, a message turned up on the openbsd-misc mailing list that pretty much matched our experience — a 95% reduction in spam in line to be treated to content filtering — so setting up precise measuring became a thing to do when we could get around to it.

Now enough with the background. It is relatively easy to extract at least some data that would give us a rough picture of the relative effectiveness of the greylisting and greytrapping versus the content filtering on receipt. The setup is very similar to the one described in the practically-oriented parts of the Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools and is part of a syncronizing multi-domain setup rougly as described in the earlier article In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe.

Using only tools found in the OpenBSD base system, I went on to collect data.

Whenever spamd(8) closes a connection it logs a message to that effect, so

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c disconnected

Supplies the total number of connections closed by spamd(8) during November 1st, fetched from the archived log file.

Similarily

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c BLACK

provides the number of connections during the same 24 hour period initiated by hosts that were already in one of the blocklists used.

The command to get the number of connections that had cleared the first hurdle and entered greylisted status would be

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c GREY

And the number of hosts that had been well behaved enough to enter the whitelist and be allowed to talk to the real SMTP service comes out of

$ zgrep "Nov  1" /var/log/spamd.6.gz | grep -c whitelisting

For hosts that have reached this far and did not fail the content filtering we do during receipt, we get the number with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c Completed

It is however worth noting that our MTA exim reports Completed for apparently message deliveries in both directions, so the number of received messages, or messages that did inbox is likely about thirty percent lower.

The number of messages rejected for one reason or the other, by being addressed to an undeliverable address or by failing content filtering we find with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c rejected

And finally, a side effect of a frequently run log reading script that adds hosts with certain kinds of characteristics such as not having a correct reverse DNS entry to a blocklist and kills all their connections will at times produce an unexpected disconnection while reading SMTP command message. We find those with

$ doas zgrep 2022-11-02 /var/spool/exim/logs/main.log.6.gz | grep -c unexpected

Those are hosts that somehow got past spamd(8) by behaving enough like a real SMTP server to clear greylisting. However spamd(8) does not have the ability to check for valid reverse, so that part is left in our case to check for by reading the log files at intervals.

The following table has the data for November 2022 —

Date	Incoming SMTP connections	BLACK connections	GREY connections	New whitelist entries	Deliveries	Rejected	Unexpected disconnect
2022-11-01	53303	38951	2580	54	1347	409	384
2022-11-02	55653	40467	2174	121	1297	549	330
2022-11-03	59658	43901	2086	85	1260	865	759
2022-11-04	57462	45674	1683	71	1270	30	0
2022-11-05	44993	43571	2146	105	1182	43	0
2022-11-06	36768	37802	2322	86	1366	184	0
2022-11-07	49464	44213	2398	182	1424	67	0
2022-11-08	52285	45904	2676	113	1513	69	3
2022-11-09	47652	47988	2085	105	1438	154	0
2022-11-10	57850	49875	2614	104	1435	192	2
2022-11-11	60269	56719	2355	99	1420	90	1
2022-11-12	46139	54073	1160	96	1182	29	0
2022-11-13	40497	40221	1777	70	1239	189	0
2022-11-14	59965	59951	2062	63	1382	145	73
2022-11-15	56265	32727	2304	113	1298	351	301
2022-11-16	77252	58029	1925	109	1340	282	33
2022-11-17	43107	30713	786	131	1250	215	17
2022-11-18	49448	48999	1590	96	1327	194	1
2022-11-19	42413	45927	973	92	1182	182	70
2022-11-20	50890	55318	1558	77	1203	358	33
2022-11-21	36601	35070	1707	125	1321	241	146
2022-11-22	37840	35499	2055	99	1359	142	17
2022-11-23	43186	34545	1314	114	1345	103	21
2022-11-24	46802	45765	1856	66	1269	729	52
2022-11-25	70911	52404	1315	89	1326	1488	395
2022-11-26	39780	32226	1500	77	1175	954	379
2022-11-27	67578	41581	1743	85	1231	523	315
2022-11-28	54688	37534	2433	77	1337	321	269
2022-11-29	70893	45917	2502	65	1248	87	39
2022-11-30	50280	35585	2567	67	1324	1293	1113

The table is also available as a comma separated (CSV) file.

As I mentioned earlier, the number of connections to the outer layer spamd(8) is likely higher than what would be expected on sites that are not considered a honeypot and home to in excess of three hundred thousand imaginary friends (see The Things Spammers Believe - A Tale of 300,000 Imaginary Friends or the trackerless version.

That said, I think the data shows that catching the unwanted traffic early, and discarding as much as possible of that traffic before it reaches the resource hungry content filtering is definitely beneficial.

Even sites that do not actively bait the baddies out there would likely see noticeable energy bill savings by having their mail servers run quiter and cooler, as they definitely will after getting a greylisting, and optionally greytrapping setup in front of them. Those services have a truly low energy consumption profile.

If you found this article interesting, useful or just simply irritating, I would like to hear from you. Please use the comment field, or if you prefer, send email to nix at nxdomain dot no with a subject that at least tries to sound sensible and relevant.

As always, if you are interested in research on items mentioned in this article, I will be able to provide data for study. I will honor reasonable requests.

That grumpy BSD guy

Friday, December 23, 2022

Can Your Spam-eater Manage to Catch Seventy-one Percent Like This Other Service?

No comments:

Post a Comment

About Me

Buy The Book of PF

Post, tweet, follow

Upcoming Talks

Blog Archive

Follow me on Twitter

Twitter Updates

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)

That grumpy BSD guy

Friday, December 23, 2022

Can Your Spam-eater Manage to Catch Seventy-one Percent Like This Other Service?

No comments:

Post a Comment

About Me

Buy The Book of PF

Post, tweet, follow

Upcoming Talks

Blog Archive

Subscribe To Posts From Here

Follow me on Twitter

Twitter Updates

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)