Thursday, August 7, 2008

Now that we have their addresses, do we name and shame?

The legal owners of botnet-controlled spam senders are quite likely unaware what their machines are doing. Do they deserve to be outed, named and shamed?

Earlier this week a friendly Australian who I think had been reading my blog sent me a few questions about spam, spammers and what to do with them. Would it for example be useful to forward the IP addresses in the local traplist to law enforcement? After all, I publish a dump of IP addresses from my local-greytrap once per hour, and apparently at least some people are fetching and using that as a valid blacklist on a regular basis.

(On a side note: if you do fetch that list regularly, keep in mind that the data is dumped ten past every hour, that's when the data is fresh. If you fetch at every full hour, the data is already fifty minutes old).

Anyway, my initial reaction to the question about forwarding the list of IP addresses to law enforcement was, along the lines of "Well, a raw list of IP addresses doesn't really add up to a lot of evidence, but if you can extract the log entries for each one, you may have something". My actual answer was phrased a little differently, but even while I was writing my reply I started fiddling with a script to read my list of trapped IP addresses and grep the spamd log for all entries for each IP address.

My complete collection of spamd logs goes back a few years, so searching for a complete history does take a while. (For techies: for each IP address, a grep of the entire log takes at least a few seconds (s) , total time is s is times number of entries (N), typically a few thousand, and grepping in parallel is difficult, because you want the output per IP address, not interlaced like in the raw log data).

After a while, you can see output roughly like this:

Aug 7 07:24:16 skapet spamd[13548]: connected (12/9)
Aug 7 07:24:30 skapet spamd[13548]: (GREY)
<> -> <>
Aug 7 07:24:31 skapet spamd[13548]: disconnected after 15 seconds.
Aug 7 07:24:44 skapet spamd[13548]: connected (9/7)
Aug 7 07:25:06 skapet spamd[13548]: (GREY)
<> -> <>
Aug 7 07:25:07 skapet spamd[13548]: disconnected after 23 seconds.
Aug 7 07:25:08 skapet spamd[13548]: connected (11/9)
Aug 7 07:25:23 skapet spamd[13548]: (GREY)
<> -> <>
Aug 7 07:25:24 skapet spamd[13548]: disconnected after 16 seconds.
Aug 7 07:26:16 skapet spamd[13548]: connected (11/9), lists: spamd-greytrap
Aug 7 07:30:00 skapet spamd[13548]: (BLACK)
-> <> -> <>
Aug 7 07:31:43 skapet spamd[13548]: From: "Frances Ballard"
-> <>
Aug 7 07:31:43 skapet spamd[13548]: To: <>
Aug 7 07:31:43 skapet spamd[13548]: Subject: Extraordinary Narcotic Deals
Aug 7 07:32:47 skapet spamd[13548]: disconnected after 391 seconds.
lists: spamd-greytrap

That's rougly what I would have expected to see: A host tries to send obvious spam to one of the trap addresses (one I harvested from incoming noise earlier), is added to spamd-greytrap and on the next attempts gets stuck for a few minutes. (Notice that this spammer has another version of grepable From: addresses - prepend akstc and append mnsdgs to the basename so becomes the junk address Content and header filterers, please take note.) I thought that this would be the typical behavior, but browsing the output from my script, entries of this kind seems to be more of the norm:

Aug 6 12:47:15 skapet spamd[13548]: connected (12/8)
Aug 6 12:47:27 skapet spamd[13548]: (GREY)
<> -> <>
Aug 6 12:47:27 skapet spamd[13548]: disconnected after 12 seconds.

Here, the spambot tries exactly once, never to return. It's possible they detect the stuttering (our side answers one byte per second for the first ten seconds) and give up for that reason, but it could equally well be that it's classic fire-and-forget, the reason why greylisting still works. Or both, for that matter.

But back to the real question: Now that we have the data, what do we do with it?

With the script I have now, extracting the history for each of several thousand IP addresses takes some hours. The output is enlightening, but by the time the run is complete, it could be significantly more than twenty-four hours since the machines listed tried to send spam.

Should we name and shame anyway? If we forward the data to law enforcement, would they care?

For the time being, I'll try to think of a quicker way to extract the data. Any input on how to make the process more efficient is welcome, as is considered (learned or otherwise) opinion on the ethical up- or downside of publishing spamd log data.


  1. I think it is more useful to do something like:
    1 - dig for the hosts parent IP. most of the botnet machines are DSL connected via an ISP, so find the ISP mail server.
    2 - send a complaint to abuse@ISP listing all the connects from machines in its cloud, asking that the relevant users be notified that their machines are compromised.

    Just publishing logs presupposes that people will actually check. They won't, in my experience.

    On the local net the bot problem is solved quite simply. Any machine detected as sending spam will be blocked from using the internet by the proxy. IT has been foudn that the relevant users quickly pitch up at the PC support desk asking for help.

  2. As you may have guessed, I've been doing almost exactly what you suggest for years :)

    however, at 2) you will in my experience unfortunately find significant numbers of domains that do not in fact have a working abuse@, postmaster@ etc

    The abuse@-less and postmaster@-less sites are possibly also unlikely to check sites like mine or actually care, but then perhaps if enough sites start publishing the gory log details there may be pressure to actually do something.

    Time will show.


Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.