Showing posts with label traplist. Show all posts
Showing posts with label traplist. Show all posts

Wednesday, May 21, 2008

Fake Address Round Trip Time: 13 days

The results are in. Our adversaries really are mindless automata.

Regular readers will have noticed that I've been running a small scale experiment over the last few months, feeding one spammer byproduct back to them via a reasonably accessible web page. The hope was that I would learn a few things about spammer behavior in the process.

After collecting fake addresses in my domains generated elsewhere for a while, I started noticing that a short time after I'd put addresses on the traplist page, they would start appearing as To: addresses on what appeared to be spam message entries in my logs. After a while more, I was certain that the round trip time was down to a few days, but my notes did not include exact dates for when each individual harvested address made it into my spamtrap list. Meaning, of course, I had no way of telling just how fast the process is.

Time to cheat slightly, or, as they say in the trade, perform a controlled experiment. Instead of sticking to the original plan of strictly collecting addresses generated elsewhere and feed them back to the harvesters via the web page, I decided to deviate slightly and plant an address at a specified date, and maybe add another few for data points later. I did the obvious thing and on October 11th, 2007, I slipped put-here-2007-10-11@datadok.no into one of that day's batches of collected addresses.

Needless to say, my attention wandered from the project in the meantime. After all, in October there was still that book to finish, and after the book was done, other developments had my attention or at least drained my energy for quite a while. So it was only last Sunday it struck me that by now I should have at least some data on what actually happened. One other reason it was suddenly quite appropriate to sum up the data was that the datadok.no domain had been turned over to new owners, and it will likely move out of my sphere of responsibility in the near future.

So here's the result, the fake address round trip time:

$ grep put-here-2007-10-11@datadok.no /var/log/spamd
Oct 24 03:40:40 skapet spamd[20795]: (BLACK) 60.50.174.129:
<pepgyoygq@boisdelan.com> -> <put-here-2007-10-11@datadok.no>


That is, the first time my artificially inserted address was used as a spam target was thirteen days after I put it on the traplist page. Since then, something, somewhere, has tried

$ grep -c put-here-2007-10-11@datadok.no /var/log/spamd
300


to deliver email to our imaginary friend a total of 300 times. Data taken from the spamd in front of that domain's secondary mail exchanger, of course. As always, I would love to hear from you about any related experiences.

In upcoming columns we will see, er, actually I find myself with such a selection of tempting topics to choose from, it is really hard to decide what to cover next. But the next one will appear here shortly.


Update 2015-08-01: In a totally unrelated article, posted on the afternoon of July 24th, 2015, the string razz@skapet.bsdly.net appeared. it took only three days, two hours and forty-eight minutes (approximately) before my spamd(8) logged this attempt at delivering mail to that address:

Jul 27 20:16:01 skapet spamd[1520]: (GREY) 183.79.28.71: <esther1jomkoma1ej@yahoo.co.jp> -> <razz@skapet.bsdly.net>

Thanks to the script that also prepares the downloadable list of trapped IP addresses, I was alerted of this happening, and the address was duly added to the spamtraps list and the accompanying web page as part of the batch that took the list to its current count of 29135 entries.

From this accidental anecdotal evidence, we can conclude that the time from when random string containing an at sign appears on a web site to the time it's used as spam target has now shrunk to about three days.

The attempts seem to be a little less energetic this time though: greping the relevant logs turns up only 27 attempts at delivery. It's possible this is down to the fact that there are now so many more imaginary friends to choose from in that long list.



Note: The @datadok.no addresses were removed from the traplist in late 2008 when that domain was handed over to the care of a different organization.

Saturday, August 4, 2007

We see your every move, spammer

My logs tell me that the spamtrap topic is a favorite, and more likely than not somebody who read the announcement will also take a peek at the traplist itself. So while I'm slowly preparing a post about something else entirely (which what I feel is actually a lot more interesting), it can't hurt to fill you in on what I've been doing to keep track of spammer behavior.

It's a quiet life, at least by surface appearances. In between the steady stream of mainly confidential tasks handled at Datadok and the odd request to bsdly.net for services of one kind or the other, I focus on getting the book done, chapter by chapter.

The traplist is slowly expanding. The collection process itself is automated for all the tedious tasks. The "Unknown user" entries from my mail server logs as a source of traplist material almost dried up, so I started looking at the greylists directly.

After sampling my greylists at random intervals for a while, a short shell script now dumps the data to somewhere safe ten past every full hour, notes the number of grey entries and TRAPPED entries, and dumps the TRAPPED IP addresses to a file which is available to the world from the traplist page. The list is comfortably short at most times. I imagine somebody with beefier bandwidth or a more widely known domain would have more hosts trapped at any time.

The file with currently trapped hosts gets overwritten each time the script runs. There is an outside chance that the other generated data might be useful in future research, and storage is cheap these days, so I keep the data around.

Observing the greylists reveal some odd things, like a certain Taiwanese host which tried, on August 1st, 2007, to send roughly a thousand messages to one address in a domain elsewhere, using generated From: addresses at every host name and IP address in our local network. They probably thought they'd found an open relay. Spamd's "250 This is hurting you more than it is hurting me." probably did not register with them as an outright rejection, much like it fools a number of web available open relay detectors.

The conclusions still stand, though. They echo the conclusions from the malware paper (*): the spammers are working harder at sending their trash mainly because we are as close as does not matter to always correctly detecting and dealing with their junk traffic.

I keep wondering if even the few minutes' worth of work a day updating the traplist is worth it, since we are catching essentially all spam anyway. Then at intervals, one or more of the generated, made up addresses from the list actually turns up in my greylist dumps.

(*) Whenever the "The silent network" paper comes up in discussions, it looks like depending on who you are, it's either way too long or too short. At twenty-few pages it's too long for the attention span of the loudmouth self-appointed SMTP experts you may encounter on web forums and mailing lists, and too short (read: not a book) to carry much weight with a decision maker who will not read much more than the executive summary anyway. Making that article morph into a book is on my list of Things To Look Into Later If Time Allows And It Still Makes Sense Then.

If you're still there after reading all this: Click the ads already. Make somebody else pay for your entertainment.

Friday, July 13, 2007

Spam is a solved problem

Executive summary: Spam is a solved problem, email works again. There are a few knuckle draggers out there who haven't noticed yet, but we'll get around to dealing with those shortly.

I've been looking over my log summaries again. My regular logs get rotated out of existence after seven days, but from the summaries I do keep around, it looks like various made up @datadok.no addresses have been used as spammers' fake From: addresses for about a month. I was too busy with other Very! Urgent! Things! to notice at first, but it finally dawned on me when I searched my mail server logs for "Unknown" as in "Unknown user" and saw from the results that somebody, somewhere, was using that domain for generating sender addresses.

Note: This piece is also available without trackers but classic formatting only here.

After about two weeks of observation and collecting made up or generated addresses for my traplist, my conclusion is what the title of this post says. Spam is not a problem anymore. I know, of course, that "how to cope with email and spam" self help guides are best sellers, and a recent Salon.com piece even went so far as saying about email,
Now, it seems, we're drowning. There's simply too much e-mail. The tide of spam buries valuable messages.

That kind of surprises me, because it's not what we're seeing here at all. Of course we know that there's a lot of junk being sent, but ask any of the people on the sites I run on any given day how much spam they've received recently and they have to look up the date of the last one in their "Junk mail" folders.

I do see some spam myself, mainly because I still fetch and read mail I receive at an ISP address I've used a lot for USENET and mailing lists over the years. And since unfortunately no method ever has a zero error rate, occasionally a spam message or two trickles through that shouldn't have on the systems I run myself. But if the tide of spam buries valuable messages, you haven't kept up with the technology, plain and simple.

By and large, from the perspective of somebody who has been the purported sender of an unknown portion of the tide which drowns out the Salon.com writer's messages, it looks like spam is treated correctly or at least in ways that do not annoy others unnecessarily at most sites. (In all fairness that piece is more about email versus other types of writing than technical matters, and certainly worth reading for that reason. The same writer has written a number of other articles which are worth your time too.)

At the last count, our main spamd running gateway had all of 316 addresses in the local spamd-greytrap table, meaning that only that many hosts have actually tried to send mail to one or more of the addresses listed at our spamtrap page during the last 24 hours. Some of the trapped machines would have been active spam senders, and most of the rest seem to have been sites which were configured to receive spam and bounce back to the From: address when the spam was not deliverable.

That is an important point to note. If your system sends a 'message undeliverable' bounce message for spam sent to a non-existent user, it is configured to deliver spam to the users you do have, and there are certainly ways to avoid that. I've decided not to plug any of my other writing directly in this post, but you should be able to find the references easily enough if you're interested.

Reading the spamd logs is sometimes quite entertaining if you're that kind of guy or girl. Here is one example of a site with clearly deficient spam and/or malware filtering, possibly their own homebrew:

Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: From postmaster@trendmicro.com
Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: To: jonson3846@datadok.no
Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: Subject: Delivery Status Notification (Failure)

It does not matter much to us, but they'll be unable to get mail through to us for the next 24 hours.

The next one is clearly problematic, since whoever set up the system appears to have left back in the time when there still was a chance that spammers used real addresses. Or maybe the poor wretch stayed on and now suffers from delusions, incompentence or both:

Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228: Subject: Considered UNSOLICITED BULK EMAIL, apparently from you
Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228: From: "Content-filter at srv77.kit.kz" <postmaster@srv77.kit.kz>
Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228: To:
<skulkedq58@datadok.no>


Would SPF have helped? Possibly. We have our records set up, but clearly these guys are not using it in any meaningful way, and after -- what is it -- five years it's still not clear which of the competing RFCs with varying degrees of proprietary content is going to come out on top.

Staying out of our traplist would have saved some resources on their side. On our side, well, we have a working system. Email works. Mail from us has to go through our mail server. Incoming mail needs to clear spamd's greylisting (and really needs to come from IP addresses which are not in any of the blocklists we use) and pass content filtering inspection by spamassassin and clamav, all of it conveniently within reach on any freely available BSD system. The content filtering packages are available on your favorite Linux as well, but on our sites, we use OpenBSD and FreeBSD.

Living spam free and unworried by malware is possible. If you make a few right choices it's actually easy, it doesn't cost much and just imagine how much of your time you stop wasting.

Monday, July 9, 2007

Hey, spammer! Here's a list for you!

Last week I started noticing from my log summaries that my mail servers had seen a lot more mail to non-existent users than usual. This usually happens when somebody has picked one of our domains as the home of their made-up return addresses for their spam run. This time, from the looks of it, the spam runs were mainly targeted at Russian and Ukrainian users. At least that's where most of the backscatter appears to have come from.

As I've written before in the PF tutorial and the malware paper (updated version available as this blog post -- that's the end of today's plugging, I promise), I've used the "Unknown user" messages as a valuable data source for my spamtrap list, just quitely adding addresses that looked really unlikely to ever become valid. After a brief airing on the OpenBSD-misc mailing list and running it by my colleagues at Datadok and Dataped, I've decided to take it a bit further.

Now that I've got a list of addresses which will never receive any legitimate mail, I really want spammers to try to send mail to those addresses. After all, if they send anything to an address which consists of a random string with one of our domains stuck on after the '@', we know it's all spam from there on.

We don't care about the rest, for the next 24 hours. Your SMTP dialogue with us (actually our spamd) will be all a-stutter, receiving answers one byte at the time until you give up. For the record, that usually takes about 400 seconds, with the really imbecile ones taking a lot longer. See the paper or the tutorial for some numbers.

The other possibility is of course that your system is set up in a way which makes it actually receive and try to deliver spam. Some of the spam will be addressed to non-existent users in your domain, so if your users receive spam, you will be trying to send bounce messages back to the purported sender for spam to non-existent users. That's tough, kid. If you're set up that way, your machine will be treated to the tarpit here for the next 24 hours. All a-stutter and all that. Repeat offenders stay there longer.

Now for the spamtrap list, I've checked that my colleagues and associates have never actually wanted to use those addresses for anything, and I made this page which wraps it all in a bit of explanation. For some reason, the list keeps growing each time I look at my log summaries.

When I get around to it and find a visually not-too-horrible way to do it, I'll include links to that page where they fit naturally on our web sites. In the meantime, here's hoping that the spammers' address harvesting robots find this list and put it to good use.

The chapter, it's improving. More later.

UPDATE 12-jul-2007: The softer side of me ponders the possibility of sending email form letters to the various postmaster@s with the URL to this blog post. On the other hand, I'm not sure I'm ready for another round of finding out that postmaster@ is in fact not deliverable at a surprising number of sites around the world.

One other thing I've noticed since I published the traplist is that bounces to addresses like mixt.apex.dp.ua-1184227575-testing@datadok.no have started appearing in the logs. I don't see how messages like these could be useful by themselves, but the addresses are of course obvious traplist material.

13-jul-2007: Oddly enough, there's still a stream of backscatter, and my logs tell me a few new addresses turn up every day. This morning's fresh ones were careersogt2083@datadok.no, phalanxesxb88@datadok.no and retryingvtt@datadok.no. Another few bytes to help weed out the bad ones early, thanks to the robots out there.