Showing posts with label green cybercrime detection. Show all posts
Showing posts with label green cybercrime detection. Show all posts

Wednesday, January 1, 2025

A Suitably Bizarre Start of the Year 2025

© 2025 Peter N. M. Hansteen

Already somewhat blasé from life in the honeypots, yours truly registers an even more bizarre level of events after a some routine logs spelunking

If you're reading this soon after the piece is published, 2025 is a fresh new year, and I would like to wish you all the best for the year ahead.

Then I want to relate what happened here (or rather at the Internet facing network interface of the server in question) during the initial few hours of the new year 2025.

Note: This piece is also available without trackers but classic formatting only here.

If you are a returning reader, you will be familiar with my ongoing experiment and studies of Internet miscreants and how to thwart their efforts as effectively as possible while expending no more than absolutely necessary in terms of time or energy on our end. Central to those efforts are the greytrapping based blocklist and the ever-growing list of spamtraps, which late in 2024 passed the half a million mark, right now numbering 568212 entries of known bad, not deliverable email addresses in our domains (almost certain to have increased by the time you read this).

I have written about the daily maintenance tasks for the lists, such as they are, in previous entries such as the list homepage pointed to in the previous paragraph and the traplist ethics page as well as the blog post Goodness, Enumerated by Robots. Or, Handling Those Who Do Not Play Well With Greylisting (November 2018, also here) or for that matter the piece I wrote about the arrival of the three hundred thousandth spamtrap, The Things Spammers Believe - A Tale of 300,000 Imaginary Friends (also here).

All of those pieces show that the original emphasis was to keep the working environment sane for the local users, and the fact that I could generate resources that I could make available for others to use was really just a byproduct of that work, while of course a welcome one for its users.

After some years, and certainly around the time the list of spamtraps had reached the hundreds of thousands, the "salt the mine and poison the well" part (the fourth principle listed on the ethics page) part had subtly slid more into central focus, and I was adding incrementally to my arsenal of scripts and one-liners to expand the list of "imaginary friends" as I came to think of new angles.

Most of these would involve fishing out potential local parts to (the parts before the '@') from the din of spamd log entries. Some of these are hinted at in Harvesting the Noise While it's Fresh, Revisited (also here).

The pace of growth for the spamtraps list did pick up as a consequence, and as I reported in a fediverse post, the total made the half millon mark at some point in December of 2024.

Part of the updating procedure is to search logs for addresses not already in the spamtraps list. One of the things I tend to do after extracting the list of addresses somebot tried to deliver to and that we have not been included already in the spamtraps is to extract the log entries involving those supposedly new addresses for further processing. The output from that grep centered one liner from the overnight run taken during the late morning of January 1st, 2025 can be found here.

Take a few moments to look at that one if you want.

You will be looking at the output of a series of grep searches for destination addresses.

The bulk of the data shows that hosts not in our local networks tried to deliver largish numbers of messages to third party domains such as qq.com and gmail.com, using our spamtrap addresses as the purported sender addresses, only of course to be added to the set of greytrapped addresses.

Making up addresses in other people's domains to use as From or Reply-to addresses on your spam messages is not a new thing, of course, as long as you do not care to get any feedback on what actually happened with those attempted deliveries.

What baffled me more than a little was that the addresses were apparently used in the exact sequence they would have been found at this site after a fairly recent update run.

Apart from the sheer number of addresses and their freshness, the only item of interest was that behind each of the IP addresses involved there appears to be a number of hosts -- likely virtual machines -- with distinct identifiers in their HELO/EHLO sequence, likely generated strings of a handful of characters such AXBPvDt.

These quasi-random, generated IDs were of course soon made into local parts for new spamtraps. As would, at times some other items it is possible to extract from logs with common unix commands.

So as a start to the new year, this was surprisingly fitting. The general insanity we have seen in this particular field continues, but appears to have reached a new level at the tail end of the tumultous year just past, possibly heading for new levels still.

Good night and good luck.

Addendum 2025-01-13

For those so inclined, it is perhaps worth noting that after a bit of pondering some time after writing this other piece (also here), I started looking at extracting other items from the spamd logs log entries.

I ended up with extracting the local parts for new spamtraps from the purported sender addreses of entries for trapped delivery attempts some time mid-2024. This made for a significant increase in the number of new imaginary friends, and by the final months of that year I had also started extracting similarly from the string offered by the spam senders as their host name in the EHLO/HELO exchange, which of course swelled the population further.

The effect is clearly to be seen in the file that records the number of spamtraps added per year, updated via trivial scriptery roughly daily.

I hope this article and its addenda helps inspire others in our efforts of green cybercrime prevention by giving the actually intelligent detection methods less work to do.

Addendum 2025-03-20

Only a couple of weeks after the previous addendum was written, it was outdated. Due to some trivial resource restraints lead do a slightly different organization of the log data, now as per year files up to and including 2024, and per month from 2025 onwards, in this directory, while the main traplist page still has the list of spamtraps itself in one piece.


Upcoming Events to watch for:
BSDCan 2025 June 11 through June 14th 2025, in Ottawa, Canada. The Call for papers is active, with February 12 2025 as the deadline for submissions.

EuroBSDCon 2025 September 25-28, 2025 in Zagreb, Croatia.

Friday, December 9, 2022

Harvesting the Noise While it's Fresh, Revisited

A year's worth of logs yields entertaining but unsurprising findings about spammer behavior.
Spam mail, masked but detected, from the archive

Returning readers will be almost painfully aware that here at nxdomain.no (also known as bsdly.net) we host and maintain a blocklist, which in turn is the product of traffic that hits our mail system with attempts at delivery to one or more of the now more than three hundred thousand known bad addresses, also featured at the blocklist home page.

Note: This piece is also available without trackers but only basic formatting here

When I first set up the greytrapping back in 2007, the initial spamtraps were non-deliverable addresses in our domains that I had extracted from mail server logs. I won't bore you with the details (which are anyway documented at length in earlier articles), but it was clear from those logs that the domains we hosted back then were more or less continously subject to Joe jobs, as in somebody sending messages with a forged From: field with a made up address in our domains.

After a while I started extracting the potential new spamtraps from the greylist — actually dumping data from there once per hour as part of the script that also generated the exported blocklist. The basic process is described in the July 25 2007 article Harvesting the noise while it's still fresh; SPF found potentially useful (also available trackerless but with links to tracked articles).

Then today it struck me that while that method is useful, by extracting only from the greylist we will only ever collect the address from the initial connections. Any addresses attempted after the miscreants enter the blocklist will simply not be recorded there.

This of course lead to the question: What did we miss?

Fortunately I keep my logs around for a while, the most easily accessible log archive for my main spamd spans a lttle over a year. So I set about with some very basic grep and awk, which netted me this raw list of targeted addresses from the spamd logs.

The list weighs in at a total of 269903 entries, as counted by wc -l.

Some of those addresses are valid, and a small, but actually significant, number are in domains we do not actually serve here, and some entries do not look like mail addresses at all. The stranger ones could be strings encoded in a character set that spamd is not equipped to handle, or could be other binary data that might have been intended to trigger bugs in some of the variants of fully equipped SMTP servers that are out there. Or simply noise of any other kind, including a byproduct of the not very intelligent extraction one-liner I used.

The target addresses in foreign domains I take as a sign that at least some spamming operators mistake a reasonably configured spamd for an open relay, just like they did all those years ago when I started running the greytrapping.

Some things apparently stay the same no matter how the rest of the world has found a way to move forward.

While I did a few other tasks and finally started writing this article, the bulk of the processes that would answer the question posed earlier (What did we miss?) could fortunately run unattended in the background, and after some manual massaging we are left with a results file, with 1530 entries that were none of

  • actually useful deliverable addresses in our domains
  • existing spamtraps

This means of course that the collection of imaginary friends expanded by the same number, and now stands at 304154 entries.

Which I suppose means that harvesting the noise even after a period of aging for refinement can be a good thing.

The entries added represent a wide variety of phenomena. Quite a few seem to be truncated versions of earlier spamtrap entries, and a fair number of the new entries look like they may have descended from artifacts of stupidity such as products of SMTP callbacks. Proving mainly that in mail and spam handling, there appears to be a space still for the less intellectually astute.

With all of this said, the natural followup question is, given the modest net result, was this worth the effort?

Well, the raw output that yielded 269903 entries needed some manual operations in order to weed out the obvious noise (exact time used not recorded), followed by another background task that took, according to time(1)

    real        105m24.220s
    user        73m3.280s
    sys	        29m14.930s
    

which yielded 1577 entries that were pared down to 1530 entries that met the criteria for inclusion in the circle of imaginary friends (also known as spamtraps).

Before this experiment, the spamtraps list numbered 302625, after including the result here, the count stands at 304154, for a gain of less than one percent of the previous total. Again, if you check back at the traplist home page now, the total number is likely to have increased again.

So was it worth the effort? I feel that as an experiment, it was worth doing.

Whether or not it is an experiment that is worth repeating is a question for another day.

If you have opinions on this, I would love to hear from you, in comments, via email or messages on whichever social media brought you the link to this article.

As always, parties interested in studying the data referenced in this article and other pieces I have written are welcome to contact me for arrangements. I can easily dig out more and rawer data than directly referenced here on request.

Stay safe out there.


As a side note, a slightly improved way of extracting useful data about other domains' mail service via SPF records can be found in the November 2018 artice Goodness, Enumerated by Robots. Or, Handling Those Who Do Not Play Well With Greylisting.

That article (naturally) works from the premise that you are running a recent OpenBSD system.


Addendum 2025-01-12

For those so inclined, it is perhaps worth noting that after a bit of pondering some time after writing this piece, I started looking at extracting other items from the spamd logs log entries.

I ended up with extracting the local parts for new spamtraps from the purported sender addreses of entries for trapped delivery attempts some time mid-2024. This made for a significant increase in the number of new imaginary friends, and by the final months of that year I had also started extracting similarly from the string offered by the spam senders as their host name in the EHLO/HELO exchange, which of course swelled the population further.

The effect is clearly to be seen in the file that records the number of spamtraps added per year, updated via trivial scriptery roughly daily.

I hope this article and its addendum helps inspire others in our efforts of green cybercrime prevention by giving the actually intelligent detection methods less work to do.


Addendum some more 2025-01-18

I suppose it had to happen sooner or later, but as commemmorated in this toot, which said

Likely not blogworthy in itself, but #openbsd #spamd aficionados will get a light chuckle from hearing that some scraping and massaging relevant logs had the number of imaginary friends at https://nxdomain.no/~peter/traplist.shtml for our not-friends to play with roll past the one million mark in the early hours of today CET.

The recent update of https://nxdomain.no/~peter/harvesting_the_noise_revisited.html has links to more info. #spam #antispam #greytrapping #blocklists #cybercrime

Yes, that's right, after I turned to extracting vaguely relevant data from logs in order to salt the mine and poison the well further, the number of imaginary friends quickly grew past the one million mark.

And as if this particular Saturday morning was not already quite weird enough for most tastes, somebot produced another remarkable item that I just could not restist tooting about,

And ref previous toot, the 1006089th imaginary friend to join the collection at https://nxdomain.no/~peter/traplist.shtml is, mail.protection.outlook.com@bsdly.net following this sequence: https://nxdomain.no/~peter/blogpix/2025-01_18_johnson@vicglobalintelligence.com_to_mail.protection.outlook.com@bsdly.net.txt

The bots never cease to amaze #openbsd #spamd #greytrapping #antispam #cybercrime

And the two episodes combined proved addendum-worty, at least, see https://nxdomain.no/~peter/harvesting_the_noise_revisited.html

Yes, you read that right: For reasons known only to the bots' herders (if that), the subdomain that houses mail services for a large number of Microsoft customers entered the lexicon of spammers' spanto: addresses. Only to be included at first sight in the herd of imaginary friends I hope will help poison the spammers' data further.

The activity here did of course not stop the bots from keeping on trying. A few minutes after the second addendum here was added and tooted out, my logs showed the following activity from the hosts involved in trying to spam mail.protection.outlook.com@bsdly.net: https://nxdomain.no/~peter/blogpix/2025-01-18_host_targeting_mail.protection.outlook.com@bsdly.net_all_spamd_log_entries.txt. And more likely than not, they will keep trying.

How was the start of your weekend?

Also worth noting is that if you do try to do this at home, please keep in mind that you will neeed to implement a scheme that keeps actually valid addresses in your domains out of the spamtrap pool. Otherwise regrettable episodes may arise.