Sunday, December 21, 2008

Into a new year, slowly pounding the gates

The distributed but clearly coordinated bruteforcers are still at it. How long until they reach the end of the alphabet? And why are they staying away from my OpenBSD machines? Are we seeing the contours of a controlling intelligence?

As large parts of the Western world prepares for the holidays, the swarm of little robots that started trying to pry open the doors to my machines some weeks back are still at it. As far as we can tell, the coordinated attempts started some time in early November or perhaps late October (we don't keep logs around for long enough to be sure), with an alphabetic progression that has now progressed to somewere into the os. The complete listing from the time I started noticing up to the time I started writing this column can be found here.

I've written about this before, and in fact one of those columns was slashdotted, a pleasant surprise to me and a cause of some excitement among my colleagues at FreeCode.

After writing that article, I did some further research and found out that a precursor to what we are seeing now was observed as early as May 2008, as described in an Ars Technica article published at the time. That article also reveals, via Linuxtoday, that yours truly was among the many who failed to understand the problem, at least for a while. Then again, maybe actual log excerpts would have helped.

The problem, such as it is, seems to be that a somebody who herds a botnet has decided that the laws of big numbers favors those who keep trying for long enough. User names and passwords are generally far enough from random that if you are allowed to go on for long enough, you will sooner or later manage to guess a correct combination of username and password and get access to a machine somewhere.

Sysadmins have been seeing bruteforce attacks for years. The traditional brute force attack would be a rapid succession of login attempts from one host, and usable countermeasures were devised in short order. My favorite of course involves PF, and the description of how to thwart traditional bruteforcers is one of the more popular pages in my PF tutorial.

The distributed, slow bruteforcers are different. For one, the login attempts from each host out in the cloud are spaced far enough apart in time that intrusion attmpt detectors will not trigger. Next, it takes a keen eye to spot the common thread in the attempts spaced up to a number of minutes apart: a monotonously alphabetic progression of user names, with attempts coming in from different hosts. Some number of attemtps at a specific user name, before the cloud moves on the next one, in alphabetic order.

During the period we have been observing the slow brute activity, a total of 695 hosts have been involved. A total of 665 hosts made unsuccessful attempts at authenticating at the hosts we are observing during November, while the number for December so far is 346. The typical number of attempts per user name has decreased, too, from a typical ten do fifteen during the early days down to between one and four during the last couple of weeks.

I thought at first that the decrease in activity was just an indicator that compromised hosts were getting cleaned up, but my colleague Egil Möller was the first to suggest that since we know the attempts are coordinated, it is not too far fetched to assume that the controlling system measures the rates of success for each of the chosen targets and allocates resources accordingly.

If Egil's assumption is right, we are seeing the bad guys adapting. My systems do not run any services they do not need to, and apparently all attempts at gaining access have been futile so far. So, the controlling system shifts resources to elsewhere, even if the access attempts do not stop entirely. Come to think of it, I'm not seeing any attempts at all on my OpenBSD systems, so it is possible to speculate that whoever is behind this phenomenon has decided that OpenBSD systems are hardened enough to begin with and usually run by compentent paranoids as to be useless as targets. That would be a comforting thought at the end of a long and sometimes trying year.

Speaking of the new year, look for exciting announcements coming from FreeCode. We're working on some cool things. And with a bit of luck, I might run into you at one conference or the other during the coming year.

Happy holidays to everyone.


Note:
 A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.

Saturday, December 6, 2008

A Small Update About The Slow Brutes

Slow and steady might actually do it, eventually.

The reactions to my December 2nd column hit me with a bit of surprise. The column was taken on by slashdot and Linux Today both, producing a largish number of page views, but only two clicks on my featured ads. But while my clickthrough rate is not particularly interesting to others, the comments to the columns sometimes are.

If you look at the comments at slashdot and elsewhere, most of the commenters most likely did not actually read the column in full or did not take the time to digest what it actually said, with some notable exceptions. And yes, there were others, some also wrote in via email with informed comment - thanks!

For the benefit of those who did not get the point the first time around, I'll try once more to explain what the observations are and what they may in fact mean.

A number of commenters offered well meant advice to use packages like fail2ban, denyhosts or a few others.

The common denominator for all of them is that they track single hosts that make a larger than usual number of connections or are the source of a number of failed logins higher than a certain threshold value over a set time period. I appreciate your concerns, but the subject of the column did not fit well with the way those the packages work.

In fact, a similar scheme was already in place at the site that provided the data. The machines that provided the ssh logs are FreeBSD ones (as the sharper ones have observed already, and the reasons may possibly be revealed over beer sometime), but any gateway under my control will run OpenBSD, and by extension, PF (and yes, there is a book you might want to order from one (North America) or the other (Europe and elsewhere) of the OpenBSD project's sites). For a quick fix of background, the online PF tutorial may be worth a look.

Anyway, the /etc/pf.conf at that site's gateway contains the lines

table <abusive_hosts> persist
block log quick from <abusive_hosts>

and

pass log (all) quick proto { tcp, udp } from any to any port ssh flags S/SA keep state \
(max-src-conn 15, max-src-conn-rate 7/3, overload <abusive_hosts> flush global)


Those lines provide a variation on the logic that those posters recommended. Essentially, any host that tries 15 or more simultaneous ssh connections, or come in at a rate of more than seven over the span of three seconds, will be added to the table <abusive_hosts>, and the block quick rule blocks any further access from those hosts. Yes, flush global means what you think it does.

This works at the network level. For a gateway with a potentially large number of hosts on either side, the success or otherwise of eventual authentication may not be relevant and may be better dealt with elsewhere. Anyway, at the time I started working on this column, the table <abusive_hosts> on the gateway contained only two hosts:

:~$ sudo pfctl -t abusive_hosts -v show
194.204.37.93
201.57.187.114

I keep offenders in that table for 24 hours only, I do not believe in the permanent bans that some commenters advocate. After all, there is such a thing as DHCP, and entire netblocks are reallocated with amazingly short intervals.

Anyway, looking at the authentication log on the gateway reveals how those hosts got added to the the table in the first place:

:~$ grep 194.204.37.93 /var/log/authlog
Dec 5 22:50:30 delilah sshd[15266]: Did not receive identification string from 194.204.37.93
Dec 5 22:50:37 delilah sshd[6106]: Did not receive identification string from 194.204.37.93
Dec 6 01:29:58 delilah sshd[30359]: Failed password for root from 194.204.37.93 port 47071 ssh2
Dec 6 01:29:58 delilah sshd[7293]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:29:59 delilah sshd[4395]: Failed password for root from 194.204.37.93 port 47296 ssh2
Dec 6 01:29:59 delilah sshd[27615]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:00 delilah sshd[12248]: Failed password for root from 194.204.37.93 port 47330 ssh2
Dec 6 01:30:00 delilah sshd[24579]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:01 delilah sshd[3434]: Failed password for root from 194.204.37.93 port 47380 ssh2
Dec 6 01:30:01 delilah sshd[32737]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:02 delilah sshd[11984]: Failed password for root from 194.204.37.93 port 47425 ssh2
Dec 6 01:30:02 delilah sshd[27059]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:03 delilah sshd[13345]: Failed password for root from 194.204.37.93 port 47459 ssh2
Dec 6 01:30:03 delilah sshd[1858]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:04 delilah sshd[12739]: Failed password for root from 194.204.37.93 port 47516 ssh2
Dec 6 01:30:04 delilah sshd[16843]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:05 delilah sshd[13796]: Failed password for root from 194.204.37.93 port 47564 ssh2
Dec 6 01:30:05 delilah sshd[16789]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:06 delilah sshd[628]: Failed password for root from 194.204.37.93 port 47602 ssh2
Dec 6 01:30:06 delilah sshd[6162]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:07 delilah sshd[2579]: Failed password for root from 194.204.37.93 port 47646 ssh2
Dec 6 01:30:07 delilah sshd[12461]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:08 delilah sshd[12725]: Failed password for root from 194.204.37.93 port 47685 ssh2
Dec 6 01:30:08 delilah sshd[29909]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[16560]: Failed password for root from 194.204.37.93 port 47724 ssh2
Dec 6 01:30:09 delilah sshd[1690]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[1600]: Failed password for root from 194.204.37.93 port 47771 ssh2
Dec 6 01:30:09 delilah sshd[28882]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:10 delilah sshd[29953]: Failed password for root from 194.204.37.93 port 47807 ssh2
Dec 6 01:30:10 delilah sshd[15349]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:11 delilah sshd[2962]: Failed password for root from 194.204.37.93 port 47845 ssh2
Dec 6 01:30:11 delilah sshd[27557]: Received disconnect from 194.204.37.93: 11: Bye Bye

and

:~$ grep 201.57.187.114 /var/log/authlog
Dec 5 23:55:30 delilah sshd[24338]: Did not receive identification string from 201.57.187.114
Dec 5 23:55:35 delilah sshd[23570]: Did not receive identification string from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Invalid user raimundo from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Failed password for invalid user raimundo from 201.57.187.114 port 35776 ssh2
Dec 5 23:59:58 delilah sshd[18515]: Received disconnect from 201.57.187.114: 11: Bye Bye
Dec 6 00:00:01 delilah sshd[17353]: Invalid user joan from 201.57.187.114
Dec 6 00:00:01 delilah sshd[17353]: Failed password for invalid user joan from 201.57.187.114 port 37570 ssh2
Dec 6 00:00:02 delilah sshd[30314]: Received disconnect from 201.57.187.114: 11: Bye Bye


which again shows that these were the old-fashioned, rapid-fire kind of bots. Looking a bit closer even reveals that they kept trying after they were put in the doghouse:

:~$ sudo pfctl -t abusive_hosts -vT show
194.204.37.93
Cleared: Sat Dec 6 01:30:11 2008
In/Block: [ Packets: 18985 Bytes: 835336 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]
201.57.187.114
Cleared: Sat Dec 6 00:00:02 2008
In/Block: [ Packets: 800 Bytes: 48268 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]


And of course there were no traces of those IP addresses or corresponding host names in the authentication logs in the machines where I have collected the data about slow bots. My best guess at why is that the gateway's IP address is at the low end of the routable range for that site. Note to bruteforcers: try working the Internet in reverse next time. (Not that it would help much here, but that's another story.)

The reason why I don't see much activity for other services is simply that those machines do not run all that many services, and only the services they actually run for the world's benefit are in fact available to the outside.

My log data shows a definite pattern, and the alphabetic progression points to a degree of coordination. The slow bots are, I theorize, operated by a botnet herder who has a large pool of compromised hosts available and who also believes that given enough time, sooner or later you will find a correct combination of user names and passwords for a given host. Statisticians tell me that assumption is in fact valid, at least to some extent.

By setting up the necessarily large number of attempts to come from a sizable number of hosts in either round-robin or pseudo-random order and intervals for each individual host's attempts in the several minutes to hours range, there is a very real possibility that the slow but determined campaign for control of any single system will drown in the noise.

It is useful to keep in mind that malware moved out of the hands of pranksters and vandals years ago. Mass destruction of systems or data might still make the occasional headline, but staying out of the limelight is likely to be a lot more profitable. Modern malware masters want their creations and charges to stay undetected. What we may be seeing right at this moment is that they have realized the herd may only be sustainable if they grow it slowly.


If you are interested in researching the phenomena I've blogged about, you're welcome to contact me directly for more information or raw data.


Note:
 A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.

Tuesday, December 2, 2008

A low intensity, distributed bruteforce attempt

We have seen the future of botnets, and it is a distributed, low-key affair. Are sites running free software finally becoming malware targets?

Note: This piece describes illegal activity I detected in 2008, targeting SSH servers. Later pieces in this series would hint at the existence of a specific piece of Linux malware, which I had not identified at the time this piece was written.


Phase 1: “That's odd …”
During the last few weeks, I noticed an anomaly in the authentication logs on one of my listening posts. There were a larger than usual number of ssh login attempts overall, a higher than usual number of attempts for non-existent user names as well as some failures for a few that actually exist as well.

Looking at the log directly a typical progression would look like this:

Nov 19 15:04:22 rosalita sshd[40232]: error: PAM: authentication error for illegal user alias from s514.nxs.nl
Nov 19 15:07:32 rosalita sshd[40239]: error: PAM: authentication error for illegal user alias from c90678d3.static.spo.virtua.com.br
Nov 19 15:10:20 rosalita sshd[40247]: error: PAM: authentication error for illegal user alias from 207-47-162-126.prna.static.sasknet.sk.ca
Nov 19 15:13:46 rosalita sshd[40268]: error: PAM: authentication error for illegal user alias from 125-236-218-109.adsl.xtra.co.nz
Nov 19 15:16:29 rosalita sshd[40275]: error: PAM: authentication error for illegal user alias from 200.93.147.114
Nov 19 15:19:12 rosalita sshd[40279]: error: PAM: authentication error for illegal user alias from 62.225.15.82
Nov 19 15:22:29 rosalita sshd[40298]: error: PAM: authentication error for illegal user alias from 121.33.199.39
Nov 19 15:25:14 rosalita sshd[40305]: error: PAM: authentication error for illegal user alias from 130.red-80-37-213.staticip.rima-tde.net
Nov 19 15:28:23 rosalita sshd[40309]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:31:17 rosalita sshd[40316]: error: PAM: authentication error for illegal user alias from gate-dialog-simet.jgora.dialog.net.pl
Nov 19 15:34:18 rosalita sshd[40334]: error: PAM: authentication error for illegal user alias from 80.51.31.84
Nov 19 15:37:23 rosalita sshd[40342]: error: PAM: authentication error for illegal user alias from 82.207.104.34
Nov 19 15:40:20 rosalita sshd[40350]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:43:39 rosalita sshd[40354]: error: PAM: authentication error for illegal user alias from 200.20.187.222
Nov 19 15:46:41 rosalita sshd[40374]: error: PAM: authentication error for illegal user amanda from 58.196.4.2
Nov 19 15:49:31 rosalita sshd[40378]: error: PAM: authentication error for illegal user amanda from host116-164.dissent.birch.net
Nov 19 15:55:47 rosalita sshd[40408]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 15:59:08 rosalita sshd[40412]: error: PAM: authentication error for illegal user amanda from static-71-166-159-177.washdc.east.verizon.net
Nov 19 16:02:06 rosalita sshd[40455]: error: PAM: authentication error for illegal user amanda from host87-163-static.30-87-b.business.telecomitalia.it
Nov 19 16:05:08 rosalita sshd[40459]: error: PAM: authentication error for illegal user amanda from 213.150.184.70
Nov 19 16:08:16 rosalita sshd[40465]: error: PAM: authentication error for illegal user amanda from mail.pddsl.de
Nov 19 16:11:24 rosalita sshd[40486]: error: PAM: authentication error for illegal user amanda from abu66.internetdsl.tpnet.pl
Nov 19 16:15:00 rosalita sshd[40491]: error: PAM: authentication error for illegal user amanda from 125.77.106.246
Nov 19 16:17:43 rosalita sshd[40497]: error: PAM: authentication error for illegal user amanda from 217.76.34.230
Nov 19 16:20:54 rosalita sshd[40506]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 16:24:09 rosalita sshd[40529]: error: PAM: authentication error for illegal user amanda from p578b4f0b.dip0.t-ipconnect.de
Nov 19 16:28:11 rosalita sshd[40538]: error: PAM: authentication error for illegal user amanda from mail.carena-ci.com
Nov 19 16:30:15 rosalita sshd[40551]: error: PAM: authentication error for illegal user amavis from 87.229.3.89
Nov 19 16:34:31 rosalita sshd[40567]: error: PAM: authentication error for illegal user amavis from 218.248.79.251
Nov 19 16:36:40 rosalita sshd[40574]: error: PAM: authentication error for illegal user amavis from 83-103-70-170.ip.fastwebnet.it
Nov 19 16:40:05 rosalita sshd[40596]: error: PAM: authentication error for illegal user amavis from 75-49-251-71.lightspeed.snjsca.sbcglobal.net

- and so on, with a striking regularity. See for example the attempts to log on as the alias user, 14 attempts are made from 13 different hosts, with only 70-46-140-187.orl.fdn.com trying more than once. Then thirteen attempts are made for the amanda user, from 13 other hosts. The pattern repeats again for users amavis, apache, at, and goes on with others, apparently trying users in an alphabetic sequence.

Phase 2: Not your run of the mill screwup, the data say
Repeated login attempts for non-existing users are nothing new (in fact the bruteforce avoidance section is one of the more popular parts of the PF tutorial), but I was a bit surprised to see the attempts actually reaching this machine, which is on a local network behind a PF gateway with a configuration that is in fact closely related to the one in the tutorial (and the book for that matter). Then looking at the log entries, I noticed a few more things: The attempts are never less than a minute apart, and the attempts from a single host are separated by much longer intervals. The full data set I extracted from the point I started noticing those anomalies sum up to these figures can be found here, in case you want to look at it and draw you own conclusions

Some one-liners give us illustrative numbers:

peter@thingy:~$ wc -l slowbrutes.txt
16727 slowbrutes.txt

That is, over this period there were 16727 failed ssh login attempts at this host. A large number for this particular machine, but not enough to raise eyebrows by itself at larger or busier sites.

More than sixteen thousand attempts, but for how many invalid user names?

peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
2962
peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $15}' | sort -u | wc -l
671

That is, approaching three thousand unlucky guesses, coming from 671 different hosts.

How many valid user names did they stumble upon?

peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $11}' | sort -u | wc -l
2

A grand total of two, one of them the rather obvious root, for a total of

peter@thingy:~$ grep -vc illegal slowbrutes.txt
1698

1698 attempts, coming from

peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
566

566 different hosts.

The patterns that emerge from the data, with the alphabetical ordering and apparent coordination, point to a botnet herder trying out new methods. Intrusion detection systems and adaptive firewalls are generally tuned to detecting things like large numbers of simultaneous connecions or a high rate of new connections from a host. Distributing the task of bruteforcing passwords to several hosts could seem like an inspired way to come in under the radar wherever relatively smart systems are in place. Setting the herd to attempt at a low frequency would likely mean that those failed attempts simply drown in the noise at higher volume sites, and will not be noticed.

Phase 3: Are you one of their guinea pigs, too?
There are indications that the method has not been quite perfected yet. At the start of this run, the bots would make at least ten attempts before moving on down the alphabet. Now it seems enough bots have been taken out of circulation that the typical number of attempts per user name is closer to three, with some tried only once:

Dec 2 11:45:59 rosalita sshd[55775]: error: PAM: authentication error for illegal user heaven from cpe001217e403b3-cm000f9fa6157c.cpe.net.cable.rogers.com
Dec 2 11:48:16 rosalita sshd[55778]: error: PAM: authentication error for illegal user heaven from 90.190.96.46
Dec 2 11:50:39 rosalita sshd[55791]: error: PAM: authentication error for illegal user heaven from static-71-117-126-102.snloca.dsl-w.verizon.net
Dec 2 11:55:26 rosalita sshd[55811]: error: PAM: authentication error for illegal user heavynne from dsl-217-155-184-54.zen.co.uk
Dec 2 11:57:57 rosalita sshd[55814]: error: PAM: authentication error for illegal user heavynne from pd907ee1e.dip0.t-ipconnect.de
Dec 2 12:00:20 rosalita sshd[55836]: error: PAM: authentication error for illegal user heba from 201-26-172-213.dial-up.telesp.net.br
Dec 2 12:07:37 rosalita sshd[55879]: error: PAM: authentication error for illegal user hector from 75.145.16.83
Dec 2 12:09:58 rosalita sshd[55882]: error: PAM: authentication error for illegal user hector from ppp-69-217-30-214.dsl.applwi.ameritech.net
Dec 2 12:12:33 rosalita sshd[55901]: error: PAM: authentication error for illegal user hector from 75-49-251-71.lightspeed.snjsca.sbcglobal.net
Dec 2 12:14:51 rosalita sshd[55905]: error: PAM: authentication error for illegal user hedda from 201.218.231.142
Dec 2 12:17:21 rosalita sshd[55911]: error: PAM: authentication error for illegal user hedda from 75.147.27.85
Dec 2 12:19:48 rosalita sshd[55914]: error: PAM: authentication error for illegal user hedda from 203.70.179.113

From where I'm sitting it's hard to tell whether the lower number of attempts means that the machines have cleaned up by their legal owners or whether they have simply taken out of rotation by their herders. Even with the initial 14 attempts per user name the chance of actually finding a valid combination of user names and passwords would be slim but not non-existent, but decreasing the number of attempts per time unit will necessarily make the chance of eventually finding a valid pair even smaller.

Apparently I'm not the only one seeing the slow brutes, as this post to openbsd-misc indicates. The sensible countermeasure could be to disallow shh password logins and allow only key logins, probably easier to set up and enforce than network-level measures. With the slow rate of attempts and the relatively large number of hosts involved, the undesirable traffic here is relatively hard to distinguish automatically from innocent errors unless you make have any attempt to log in with an invalid user name a sufficent reason for blocking traffic from that host.

Phase N: The shape of things to come
In the longer term view, this may very well be the shape of botnets to come. With a large enough pool of compromised hosts under their control, future botnet herders can afford to organize their activity so any one host only participates in undesirable activity at intervals long enough that malware detectors do not trigger (and thinking further ahead, if the world ever does go IPv6 wholesale and you can expect any one network interface to have dozens of IP addresses, think again how much more interesting detecting botnets becomes).

Antiware vendors will likely put their spin on this too when their marketing departments start noticing columns (Hey! It's Linux they're targeting!), but then as regular readers know, the more productive approach is always to reduce malware masters' target area by using systems that are less vulnerable because they have been extensively audited and whose makers are unafraid to make source code available for public inspection and experimentation.



As people who ran into me at the recent PF tutorial in London or at OpenCON will already know, have I joined FreeCode, the Norwegian free software consultancy. We expect to keep doing fun and useful things with free software for customers and friends, and some of it may be interesting enough to be the topics of future columns.


Note:
 A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.

Monday, October 20, 2008

IETF failed to account for greylisting

The potential for conflict between greylisting and sites with large pools of outgoing SMTP senders is well known and in need of resolution. Why does the SMTP RFC moving along the standards track fail to address this?

Standardization efforts rarely grab headlines. Except in rather exceptional circumstances (think Microsoft's recent ISO buyout), standards are formulated in response to specific technical needs, or as frequently happens, standards documents are written to codify existing and well known best practices.

As regular readers of this column will be aware, I tend to argue that in the email domain, greylisting is one such 'best practice' that would deserve to be included in a standards or best practice document. In a way it already is, but it was never actually referred to in any standard or standards draft, and its main claim to standards compliance relied on extrapolating some crucial points in RFC2821.

The technical alibi for claiming that greylisting is a valid, RFC-compliant technique comes from reading section 4.5.4.1, Sending Strategy. It's really quite straightforward: An SMTP sender that receives a temporary error (451) when it tries to deliver is required to try again later, after a reasonable time. The RFC gives a few more details such as recommended retry intervals and time to give up trying, but does not explicitly say where those repeated attempts should come from. At the time RFC2821 was written in early 2001, it was almost certainly implicit in the formulation used that the retries would come from the same hosts, and specifying that as an explicit requirement most likely seemed more than a little redundant.

The MUST requirement in RFC2821 is what cleared the way for greylisting (see greylisting.org and the relevant parts of my PF tutorial (or the book)), and the earliest implementations started appearing from 2003 onwards.

Fast-forward a few years, and you have a situation where several large operators have set up their networks with large pools of outgoing SMTP servers and no guarantee which machine in the pool will be the next to handle a message queued for a delivery retry. Some greylisting implementations use a modified algorithm that stores or at least acts upon the subnet a greylistes message came from instead of the specific IP address, while others such as OpenBSD's spamd operate strictly on individual IP addresses.

It does not exactly take a rocket scientist to figure out that here is a potential problem, at least when it comes to the stricter implementations. Sites that do not retry from the same IP address can claim to be RFC2821 compliant, since the RFC does not contain a specific requirement that the delivery retries have to be from the same IP address. Again quoting my tutorial, the solution so far has been to whitelist those sites, extracting their SPF info for whitelisting purposes if necessary.

The large number of outgoing SMTP hosts per site problem has been widely discussed, and anybody even marginally interested in mail and spam avoidance should be aware if this. Then here's a surprise for you: Apparently the writers of RFCs are actually unaware of this, or chose not to care. On October 1, 2008, I found in my IETF mailbox RFC5321, which obsoletes RFC2821. The new RFC contains a number of things, but section 4.5.4.1, "Sending Strategy" (yes, they even kept the numbering) is unchanged.

That means that the working group were either unaware of the problem or chose not to resolve the conflict at this time. It would have been a very sensible thing to explicitly state that retries MUST come from the same IP address. The only halfway sane reason not to resolve the conflict that way would be that this would have made greylisting powerful enough to possibly go a long way towards eliminating the need for SPF and other more convoluted schemes.

I can not bring myself to believe that the working group does not have at least one member who is aware of what greylisting is and knows about that sole remaining problem that needs to be addressed. RFC5321 has almost everything else covered, so I hope the working group will listen to reason and move resolve this problem before the standard is finalized.

Update 2021-04-25: It was recently pointed out to me that per RFC6647, issued in 2012, the IETF does acknowledge the usefulness of greylisting. The documents lists a number of useful implementation recommendations.



In other news, this year's EuroBSDCon in Strasbourg went smoothly with a lot of good content if a somewhat smaller number of participants than the previous one. The next chance to catch my PF tutorial live will be in London on November 26th, 2008. Contact the UKUUG for details and booking.

Monday, September 22, 2008

“Name and Shame”, or socially responsible use of your log data

Your logs contain an ever-growing mass of data on spammers. How about making an effort to make that data useful to others?

Those of us who run email services know, from sometimes painful experience, what it takes to ensure that the minimum possible amount of unwanted advertising and scams that may turn out to be security hazards reaches our users' inboxes.

Email: This should have been very simple
Handling email should really be quite simple: The server is configured to know what domains it receives mail for and what users actually exist in those domains. When a machine makes contact and indicates that it intends to deliver email, the server check if the recipient is a valid user. If the recipient is valid, the message is received and put in the relevant user's mailbox. Otherwise, a message about a failed delivery and optionally the reason for the failure is sent to the user specified as the sender.

If they were all honest people
In each part of the process, the underlying premise is that the communicating parners offer each other correct information. Frequently that is the case, and we have legitimate communications between partners with a valid reason for contacting each other. Unfortunately there are other cases where the implicit trust is abused, such as when email messages are sent with a sender address other than the real one, quite likely a made-up one in a domain that belongs to other people. Some of us occasionally receive delivery failure messages for messages we verfiably did not send[1]. If we take the time to study the contents of those messages, in almost all cases we will find that the messages are spam, sometimes the scamming kind and perhaps part of an attempt to take control of the recipient's computer or steal sensitive data.

What do the ones in charge do, then?
If you ask a typical system administrator what measures are in effect to thwart attempts at delivering unwanted or malicious messages to their users, you will most likely get a description that says, essentially, the messages are filtered through systems that inspect message contents. If the message does not contain anything known to be bad (known spam or malware) or something sufficiently similar to a known bad, the message is delivered to the user's mailbox. If the system determines that the message contents indicates it should not be delivered, the messages is thrown away undelivered, and some system administrators will tell you that the system also sends a message about the decision not to deliver the message to the stated sender address.

Large parts or this is likely part of moderately educated users' passive knowledge, and most of us are likely to accept that content filtering is all we can do to keep dubious or downright criminal elements out of our working environment. For the individual end user, only minor adjustments to this are likely to be possible.

Measures based on observed behavior
But those of us who actually run the service also have the opportunity to study the automatically generated log data from our systems and use spammers' (that is, senders of all types of unwanted mail, including malware) behavior patterns to remove most of the unwanted traffic before actual message content is known. In order to do that, it is necessary to go to a more basic level of network traffic and study sender behavior on the network level.

One of the simpler forms of behavior based measures emerged in the form of a technique called greylisting in 2003. The technique is based on a slightly pedantic and rather creative interpretation of established standards. The Internet protocol for email transfer, SMTP (the Simple Mail Transfer Protocol) allows servers that experience temporary problems that make it impossible to receive mail to report a specific 'temporary local problem' status code to correspondents trying to deliver mail. Correctly configured senders will interpret and act on the status code and delay delivery for a short time. In most circumstances, the delivery will succeed within a short time. It is worth noting that this part of the standard was formulated to help the mail service's reliability. At most times, the retries happen without alerting the person who wrote and sent the message. The messages generally reach their destination eventually.

Lists of grey and black, little white lies
Greylisting works like this: the server reports a temporary local problem to all attempts at delivery from machines the server has not exchanged mail with earlier. Experience shows that the pre-experiment hypothesis was mainly correct: Essentially all machines that try to deliver valid email are configured to check return codes and act on them, while almost all spam senders dump as many messages as possible, and never check any return codes. This means that somewhere in the eighty to high nineties percentage of all spam volume is discarded at the first delivery attempt (before any content filtering), while legitimate email reaches its intended recipients, occasionally with delayed delivery of the initial message from a new correspondent.

One other behavior based technique that predates greylisting is the use of 'blacklists' - lists of machines that have been classified as spam senders - and rejecting mail from machines on such lists. Some groups eventually started experimenting with 'tarpits', a technique that essentially means your end of the communication moves along very slowly. A much cited example is the spamd program, released as a part of the free operating system OpenBSD in May of 2003. The program's main purpose at the time was to answer email traffic from blacklisted hosts one byte per second, never leaving a blacklisted host any real chance of delivering messages.

The combination of blacklists and greylisting proved to work very well, but the quest for even more effective measures continued. Yet again, the next logical step grew out of observing spammer behavior. We saw earlier that spammers do not bother to check whether individual messages are in fact delivered.

Laying traps and bait
By early 2005, these observations lead to a theory that was soon proved useful: If we have one or more addresses in our own domains the are certain to never receive any valid mail, we can be almost a hundred percent certain that any mail addressed to those addresses is spam. The addresses are spamtraps. Any machines that try to deliver spam to those addresses are placed in a local blacklist, and we keep them busy by answering their traffic at a rate of one byte per second. The machines stay on the blacklist for 24 hours unless otherwise specified.

The new technique, dubbed greytrapping was launched as part of the improved spamd in OpenBSD 3.8, released May 2005. In early 2006, Bob Beck, one of the main spamd developers announced that his greytrapping hosts at the University of Alberta generates a downloadable blacklist based on the greyptrap data, updated once per hour, ready for inclusion in spamd setups elsewhere. This is obviously useful. Machines that try to deliver mail to addresses that were never deliverable most likely do not have any valid mail to deliver, and it we are doing society at large a favor by delaying their deliveries and wasting their time to the maximum extent possible.

It is worth mentioning that during the period we have used the University of Alberta blacklist at our site, it has contained a minimum of twenty-some thousand IP addresses, and during some busy periods have reached almost two hundred thousand.

You can help, too
Fortunately you do not need to be a core developer to be able to contribute. The exact same tools Bob Beck uses to generate his blacklist is available to everybody else as part of OpenBSD, and they are actually not very hard to use productively.

Here at BSDdly.net and associated domains we saw during the (Northern hemisphere) summer of 2007 a marked increase in email sent to addresses that have never actually existed in our domains. This was clearly a case of somebody, one or more groups, making up or generating sender addresses to avoid seeing any reactions to the spam they were sending. This in turn lead to us starting an experiment that is still ongoing. We record invalid addresses in our own domains as they turn up in our logs. From these addresses we pick the really improbable ones, put them in our local spamtrap list and publish the list on a specific web page on our server[2].

Experience shows that it it takes a very short time for the addresses we put on the web page to turn up as target addresses for spam. This means that we have succeeded in feeding the spammers data that makes it easier for us to stop their attempts, and frequently we make spam senders use significant amounts of time communicating with our machines with no chance of actually achieving anything. The number of spamtrap addresses has reached fifteen thousand, and we have at times observed groups of machines that spend weeks working through the whole list, with average time spent per unsuccessful delivery attempt clocked at roughly seven minutes.

As a byproduct of the active spammer trapping we started exporting our own list of machines that had been trapped via the spamtrap addresses during the last 24 hours and making the list available for download. This list's existence has only been announced via the spamtrap addresses web page and a few blog posts, but we see that it's retrieved, most likely automatically, at intervals and is apparently used by other sites in their systems.

At this point we have established that it is possible to create a system that makes it very unlikely that spam actually makes it through to users, while at the same time it is quite unlikely that legitimate mail is adversely affected. In other words, we have the cyberspace equivalent of good fences around our property, but spammers are still out there and may create serious probles for those who are without adequate protection.

Collecting evidence, or at least seek clarity
We would have loved to see law enforcement take the spammer problem seriously. This is not just because the spam that reaches its targets is irritating, but rather because almost all spam is sent via equipment that spammers use without the legal owners' consent. We would have liked to see resources allocated in proportion to the criminal activity the spam represents. We would have liked to help, but it might seem that we would not have usable evidence available due to the fact that we do not actually receive the messages the spammers try to deliver. On the other hand, we have at all times a list of machines that have tried to deliver spam, identified with an almost hundred percent certainty based on the spammer trapping addresses. In addition, our systems routinely produce logs of all activity, with the level of detail we set ourselves. This means that it is possible to search our logs for the IP addresses that have tried to deliver spam to our systems during the last 24 hours, and get a summary of what those machines have done.

A search of this kind typically yields a result like this:

Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks


The first line here states that 190.20.132.16 contacts our system at 02:34:29 AM on August tenth, as the fourth active SMTP connection, three blacklisted. A few seconds later it appears that this is an attempt at delivering a message to the address asasaskosmicki@bsdly.net. That address was already one of our spamtraps, most likely one that was harvested from our logs and was originally made up somewhere else. After 12 seconds, the machine disconnects. The attempted delivery to a spamtrap address means that the machine is added to our local spamd-greytrap blacklist, as indicated in the entry for the next attempt about one hour later. This second attempt lasts for 41 seconds. The third try in our log material happens just after 06:30, and the addition of the list name becks indicates that in the meantime has tried to deliver to one of Bob Beck's spammer trap addresses and has entered that blacklist, too.

Unfortunately, it is unlikely that logs of this kind are sufficient as evidence for criminal prosecution purposes, but the data may be of some use to those who have an interest in keeping machines in their care from sending spam.

“Name And Shame“, or just being neighborly?
After some discussions with colleagues I decided in early August 2008 to generate daily reports of the activities of machines that had made it into the local blacklist on bsdly.net and publish the results. If all we have is the fact that a machine has entered a blacklist as an IP address (such as 24.165.4.190), and there is no supporting material, it is fairly easy for whoever is in charge of that address range to just ignore the entry as an unsupported allegation. We hope that when whoever is responsible for the network containing 24.165.4.190 sees a sequence like this,

Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap

they will find that to be a sufficient for action of some kind. The material we generate is available via the “The Name And Shame Robot” web page. The latest complete report of log excerpts is available via links at that page. Previous versions are archived offline, but will be made available on request to parties with valid reasons to request the data.

“The Name And Shame Robot”" is rather new, and it is too early to say what effect, if any, the publication has had. We hope that others will do similar things based on their local log data or even synchronize their data with ours. If you are interested in participating, please make contact.

Regardless of other factors, we hope that the data can be useful as indicators of potential for improvement in the networks that appear regularly in the reports as well as material for studies that will produce even better techniques for spam avoidance.

A shorter version of this article in Norwegian was published in Computerworld's Norwegian edition on August 22, 2008; the longer Norwegian version is available as an earlier blog post.

[1] A collection of such failure messages collected earlier this year is available at http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt.

[2] See http://www.bsdly.net/~peter/traplist.shtml, references at that page lead to my blog, which consists of public field notes, as well as other relevant material.

About the author
Peter N. M. Hansteen (peter@bsdly.net) is a consultant, system administrator and writer, based in Bergen, Norway. In October 2008, he joined FreeCode, the Norwegian free software consultancy. He has written various articles as well as "The Book of PF", published by No Starch Press in 2007, and lectures on Unix- and network-related topics. He is a main organizer of BLUG (Bergen (BSD and) Linux User Group), vice president of NUUG (Norwegian Unix User Group) and an occasional activist for EFF's Norwegian sister organization EFN (Elektronisk Forpost Norge).

Sunday, August 31, 2008

[.NO] “Name and Shame” eller samfunnsnyttig bruk av loggdata om spammere

Today's post is in Norwegian - I'll be back in English later

Vi sitter med stadig voksende mengder med data om spammere. Kan vi bruke dette på en måte som er nyttig for andre?

Vi som selv står for driften av eposttjenester vet av tidvis smertelig erfaring hva som skal til for at minimalt med uønsket reklame og lureri som kan være direkte trusler mot sikkerheten faktisk havner i innboksene til brukerne våre.

Epost: Dette burde vært enkelt
I utgangspunktet burde det være en enkel sak å håndtere epost: Serveren er satt opp slik at den vet hvilke domener den skal ta imot post for, og hvilke brukere som eksisterer i domenene. Når en maskin tar kontakt og signaliserer at den ønsker å levere epost, sjekker serveren om meldingen er adressert til en gyldig bruker. Er det snakk om en gyldig bruker, blir meldingen mottatt og lagt i innboksen til den aktuelle brukeren, i motsatt fall får den som er oppgitt som avsender beskjed om at meldingen ikke lot seg levere, og hvorfor.

Hvis bare alle var ærlige
I hvert enkelt ledd av denne prosessen er det en underliggende forutsetning at kommunikasjonspartnerne oppgir korrekt informasjon. I mange tilfeller er det slik, og det dreier seg om legitim kommunikasjon mellom parter som har grunn til å ønske kontakt. Dessverre finnes det også tilfeller der denne grunnleggende tilliten blir brutt eller misbrukt, for eksempel når epostmeldinger blir sendt med annen avsenderadresse enn den reelle, gjerne noe oppdiktet i et domene som tilhører andre. En del av oss har også opplevd å få returmeldinger om at levering ikke var mulig på grunnlag av meldinger som vi vitterlig ikke har sendt[1]. Når vi ser nøyere på innholdet i disse meldingene, vil vi i nesten alle tilfeller se at dette er søppelpost, tidvis med innslag av svindel og kanskje ledd i førsøk på å overta mottakerens maskin eller stjele sensitive data.

Hva gjør de ansvarlige?
Hvis du spør en typisk systemadministrator om hvilke tiltak som er satt i verk for å hindre at uønskede eller skadelige meldinger faktisk kommer frem til brukerne, vil du antakelig få høre en beskrivelse som stort sett kan kokes ned til at posten blir filtrert gjennom systemer som studerer innholdet i meldingene. Hvis meldingen ikke inneholder noe som er kjent som uønsket (kjent spam eller skadelig programvare) eller noe som likner på noe som kunne være det, blir meldingen levert til brukeren den er adressert til. Hvis systemet avgjør at meldingen har innhold som gjør at den ikke skal leveres til mottaker, blir meldingen i mange tilfeller kastet uten å bli levert, og noen systemadministratorer vil nok også fortelle deg at systemet sender melding om avgjørelsen tilbake til adressen som er oppgitt som avsender.

Mye av dette hører antakelig til den passive kunnskapen hos mange, og de fleste slår seg til ro med at innholdsfiltrering er det eneste vi kan gjøre for å holde tvilsomme eller direkte kriminelle elementer unna arbeidsmiljøet. For en enkelt sluttbruker er det sannsynligvis bare mindre justeringer ut fra dette som er mulig.

Tiltak basert på observert adferd
Men for oss som håndterer driften av selve tjenesten er det mulig å studere dataene som registreres automatisk i loggene våre og bruke spammernes (her brukt om avsendere av alle typer uønsket epost, inkludert skadevare) adferd til å fjerne mesteparten av den uønskede trafikken før innholdet i meldingene er kjent. Det er nødvendig å gå ned på et noe mer grunnleggende nivå i nettverkstrafikken og studere adferden på nettverksnivå.

En av de enkleste formene av slike adferdsbaserte tiltak dukket opp i form av en teknikk som fikk navnet grålisting (greylisting) i 2003. Teknikken bygger på en litt pedantisk og ganske kreativ tolkning av allerede vedtatte standarder. Protokollen som brukes for epost-overføring på Internett, SMTP (Simple Mail Transfer Protocol) inneholder en mulighet for at en server som har antatt forbigående problemer med å ta imot epost kan rapportere om tilstanden ved å svare med en spesiell feilkode når andre maskiner prøver å levere. Hvis avsendermaskinen er korrekt konfigurert, vil den vente en viss tid før den gjør et nytt forsøk på levering, og etter all sannsynlighet vil den lykkes etter kort tid. Det er verd å merke seg at dette er en del av standarden som først og fremst skulle sørge for at eposttjenesten skulle være så pålitelig som mulig, og i de fleste tilfeller skjer alt dette uten at personen som skrev og sendte meldingen merker noe til det. Meldingene kommer frem, og alle er fornøyde.

Grå og svarte lister, små hvite løgner
Grålisting går ut på at serveren gir melding om midlertidig feil til alle forsøk på epostlevering fra maskiner den ikke har hatt kontakt med før. Erfaringene viste at antakelsene fra før eksperimentet i hovedsak var korrekte: De aller fleste maskiner som sender legitim epost er satt opp til å sjekke returkoder og handle etter dem, mens de aller fleste avsendere av spam bare pøser på så mange meldinger som mulig, uten å sjekke returkoder. Resultatet er at et sted mellom åtti og noenognitti prosent av spam-mengden blir stoppet på første forsøk (før eventuell innholdsfiltrering), mens legitim post kommer frem, i noen tilfeller med en viss forsinkelse ved første kontakt.

En annen adferdsbasert teknikk, som faktisk var i bruk før grålisting ble utbredt, er å bruke såkalte svartlister - lister over maskiner som er blitt klassifisert som spam-avsendere - og avvise post fra maskiner som var med i listen. Noen grupper begynte etterhvert å eksperimentere med såkalte tjærehull (tarpit), der teknikken går ut på å forsinke trafikk fra maskiner som er med i en svartliste ved å la sin del av kommunikasjonen gå svært sakte. Et eksempel som ofte nevnes er programmet spamd, som det frie operativsystemet OpenBSD lanserte i mai 2003. Programmet hadde da som sin hovedoppgave å svare på eposttrafikk fra svartlistede maskiner med ett tegn i sekundet, uten at avsendere som var med i en svartliste hadde noen reell mulighet til å få levert meldingene.

Kombinasjonen av svartlister og grålisting har vist seg å fungere utmerket. Likevel fortsetter praktikere og utviklere forsøkene på å få til enda mer effektive teknikker. Enda en gang kom neste logiske steg som resultat av observert adferd. Vi så tidligere at spammere ikke bryr seg om å sjekke om hver enkelt melding faktisk kommer frem.

Legge ut snarer og agn
Tidlig i 2005 førte dette til at det ble det formulert en teori som det viste seg å være hold i: Hvis vi så lager en eller flere adresser i vårt eget domene som vi vet aldri vil ha noen grunn til å få legitim post, så kan vi med nesten hundre prosent sikkerhet vite at post som blir forsøkt levert til disse adressene er spam. Adressene er spammerfeller. Maskiner som prøver å levere spam til disse adressene, kan vi legge i en lokal svartliste som vi så oppholder med ett tegn i sekundet. Maskinene blir liggende i svartlisten i 24 timer dersom ikke annet tilsier det.

Den nye teknikken fikk navnet greytrapping, og ble lansert som del av en forbedret spamd i OpenBSD 3.8 i mai 2005. Bob Beck, som var sentral deltaker i utviklingen av spamd, annonserte tidlig i 2006 at han brukte greytrapping på sentralt plasserte maskiner ved University of Alberta, og gjorde svartlistene som er resultat av fangsten, med oppdateringer hver time, tilgjengelig for nedlasting slik at andre kan bruke listene i sine oppsett. Dette er åpenbart nyttig, maskiner som sender til adresser som aldri har vært leverbare, har etter all sannsynlighet ikke legitim epost å levere, og vi gjør samfunnet en tjeneste ved å hindre dem i leveringen og å få dem til å kaste bort så mye tid som mulig.

Det er kanskje verd å nevne at svartlisten fra University of Alberta så lenge undertegnede har fulgt den har inneholdt minst noenogtyvetusen IP-adresser, og har i travle perioder vært oppe i nesten tohundretusen maskiner.

Også du kan bidra
Likevel er det ikke nødvendig å være sentralt plassert programvareutvikler for å kunne bidra positivt. De samme verktøyene som Beck bruker til å generere sin svartliste er tilgjengelige for alle som del av OpenBSD, og er ikke så vanskelig å bruke som man kunne frykte.

Her på bsdly.net og beslektede domener observerte vi sommeren 2007 en markert økning i meldinger om ikke leverbar epost til epostadresser som aldri har eksistert i noen av våre domener. Her var det helt klart snakk om at noen, en eller flere grupper, genererte eller fant på avsenderadresser for å unngå å få reaksjoner på spammen sin tilbake til seg. Dette førte i sin tur til et eksperiment som vi fortsatt har gående. Vi registrerer ugyldige adresser i våre egne domener som dukker opp i loggene våre. Av disse adressene velger vi ut de helt usannsynlige, legger dem inn i vår lokale spammerfelle-liste og legger ut listen på en egen side på webserveren vår[2].

Erfaringene viser at det tar svært kort tid før adressene vi fører opp på denne siden dukker opp som mottakeradresser. Kort og godt har vi klart å fore spammere med data som gjør det enklere for oss å stoppe dem, og i mange tilfeller får vi i tillegg spamsenderne til å bruke betydelige mengder tid på å kommunisere med våre maskiner uten å oppnå noe som helst. Antallet spammerfelle-adresser i vår liste er nå oppe i rundt femtentusen, og vi har tidvis observert grupper av maskiner som bruker noen uker på å arbeide seg gjennom hele listen, med gjennomsnittlig noe under syv minutter brukt per mislykkede leveringsforsøk.

Som et biprodukt av denne aktive spammerfangingen begynte vi å eksportere vår egen liste over maskiner som har blitt fanget via spammerfelle-adressene de siste 24 timer og legge den ut tilgjengelig for nedlasting. At denne listen finnes, er kun annonsert via siden med spammerfelle-adressene og noen bloggposter, men vi ser at den blir hentet regelmessig og antakelig automatisk av andre som bruker den i sine systemer.

Så langt har vi etablert at det er mulig å lage et system som gjør at sannsynligheten for at spam kommer gjennom til brukere er svært liten, samtidig som det er nesten helt usannsynlig at legitim post blir merkbart hindret. Dermed har vi det som tilsvarer gode gjerder rundt egen eiendom, men spammerne finnes fortsatt der ute og er et potensielt alvorlig problem for de som ikke har tilstrekkelig beskyttelse mot dem.

Samle bevis, eller i det minste skape klarhet
Aller helst hadde vi ønsket at politi og påtalemyndigheter hadde tatt spammerproblemet alvorlig. Dette ikke bare fordi den spammen som kommer frem er irriterende å se, men fordi nesten all spam sendes via utstyr som spammerne bruker uten eiernes samtykke. Kort og godt ønsker vi at det blir satt inn ressurser som står i forhold til den kriminelle virksomheten spammen representerer. Vi ville gjerne hjelpe til, men i utgangspunktet kan det virke som vi kan ha problem med å skaffe til veie brukbart bevismateriale siden vi ikke mottar meldingene som spammerne forsøker å levere. På den annen side har vi til enhver tid en liste over maskiner som har prøvd å levere spam, noe nær hundre prosent sikkert identifisert på grunnlag av spammerfelle-adressene. I tillegg produserer systemene våre rutinemessig logger over all aktivitet, med det detaljnivået vi selv velger. Dermed går det an å søke i loggene etter IP-adressene som har forsøkt å levere spam til oss siste 24 timer, og få oversikt over hva maskinene har foretatt seg.

Resultatet av et typisk søk av denne typen ser slik ut:

Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks

Den første linjen angir at 190.20.132.16 tar kontakt med vårt system klokken 02.34.29 om morgenen tiende august, som fjerde aktive forbindelse, derav tre svartlistede. Noen sekunder senere blir det klart at dette er et forsøk på å levere en melding til adressen asasaskosmicki@bsdly.net, som er blant de vi har som spammerfelle, sannsynligvis høstet fra logger og diktet opp et annet sted i verden. Etter 12 sekunder kobler denne maskinen fra. Forsøket på å levere til en spammerfelle gjør at maskinen blir oppført i vår lokale svartliste, spamd-greytrap, noe som vises klart når maskinen prøver igjen litt mer enn en time senere. På dette forsøket blir den oppholdt i 41 sekunder. Det tredje forsøket i vårt loggmateriale skjer like etter 06.30, og at listenavnet becks har kommet til, viser at maskinen i mellomtiden har forsøkt å levere til en av Bob Becks spammerfelle-adresser og nå er med i også den svartlisten.

Det er dessverre lite sannsynlig at slike logger er tilstrekkelige som bevismateriale i straffesaker, men for de som har interesse av enten å sørge for at maskinene de administrerer i så liten grad som mulig brukes til spamutsendelse eller de som har interesse av spammeradferd er dette nyttige data.

“Name And Shame”, eller kanskje bare godt naboskap?
Etter noen diskusjoner med kolleger bestemte jeg meg tidlig i august 2008 for å generere daglige oversikter over aktivitetene til maskiner som har kommet inn i vår lokale svartliste på bsdly.net og legge dem ut offentlig. Når en maskin er oppført i en svartliste med bare IP-adresse (for eksempel 24.165.4.190), uten annet materiale om oppføringen, er oppføringen mest en påstand som de ansvarlige godt kan velge å ikke tro på. Vårt håp er at om den som er ansvarlig for nettverket der 24.165.4.190 hører hjemme ser en sekvens som denne,

Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap


så er det tilstrekkelig grunnlag for å gjøre noe aktivt. Materialet er tilgjengelig via The Name And Shame Robot-siden på http://www.bsdly.net/~peter/nameandshame.html. Siste genererte loggoversikt er tilgjengelig via referanser på den siden, tidligere utgaver blir arkivert, men vil være tilgjengelige ved godt begrunnet forespørsel.

The Name and Shame Robot er såpass ny at vi ikke kan si spesielt mye om effekten av offentliggjøringen. Det er lov å håpe på at andre vil gjøre noe tilsvarende ut fra sine lokale loggdata eller kanskje til og med synkronisere sine data med våre. Ta gjerne kontakt hvis du er interessert i dette arbeidet.

Uavhengig av alt annet håper vi at dataene kan være nyttige, både som påpeking av forbedringspotensiale for de nettverkene som opptrer jevnlig i oversiktene og som materiale for studier som kan gi oss enda bedre spambekjempelse.

Noter

[1] En samling av slike returmeldinger fra tidligere i år kan beskues på http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt


[2] http://www.bsdly.net/~peter/traplist.shtml, referanser på den siden fører videre til bloggen min som jeg bruker til offentlige notater, og annet relevant materiale.

En forkortet utgave av denne artikkelen ble trykt i Computerworld Norge 22. august 2008.

Wednesday, August 27, 2008

Logfiles in the buff

Search engine optimization, deflowered.

Logs are important. Depending on the specific kind of log, the data may shape lives and generate fortunes (how many times were those ads displayed, your clickthrough rate), reveal suspicious behavior and trigger actions (such as shutting the door to that bruteforcer) or provide sysadmins such as yours truly a general idea of what works and not or anything inbetween.

If you're a sysadmin, log data or log data derivatives such as a monitoring tool's graphical status display is more likely than not an important underlying factor to determine how you spend your day.

Then of course most of the material for these columns comes from log files, too. Depending on the specific log file, I tend to either just peek at the data my monitoring scripts offer me or do some manual greping for any patterns that interest me.

One such pattern matches the filename for my resume. I put that online for job hunting purposes, and now that I'm basically a gun for hire, it's slightly interesting to see any activity involving that file.

So at semi-random intervals, I check the apache log for references to my resume. Today, the grepery turned up this nugget

92.48.107.33 - - [27/Aug/2008:04:41:12 +0200] "GET /%7Epeter/PNMH-cv.html HTTP/1.0" 200 12318 "http://afmfokuv.fcpages.com/hot-anime-lesbians.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"

and I count myself lucky that I had thoroughly swallowed my last mouthful of coffee before reading that.

In the Era of PageRank, the Age Search Engine Optimization Consultant, Season of the Clickthrough rage, I suppose we should not be entirely surprised to see such things. Just what the two documents have in common, perhaps other than targeting a very specific market, is left as an excercise for the terminally curious. I would advise some caution in choice of browser and operating system if your research takes you to the referring URL. One of the lessons of the day is, it doesn't always take a spamd log to crack you up.

PF tutorial in London, November 26
In other news, the UKUUG are hosting a full day PF tutorial featuring yours truly in London on November 26th, 2008. See the UKUUG web site for details. OpenCON is the following weekend in Venice, and I hope to make it there too.

The Name and Shame Robot
Last week the Norwegian edition of Computerworld published an article about the Name and Shame Robot, unfortunately in the paper edition only (yes, I've got an English article in process too). The article did spur some nameandshame.html traffic from unexpected places, but no offers of cooperation or spamd synchronization so far. In the meantime, I'm running into odd cron behavior differences when trying to run the generator script I wrote on my OpenBSD machines on a few FreeBSD hosts. More than likely there is a lesson to be learned there too.

Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Some times we just need facts on the table, automated.

In my previous blog post, I wondered aloud about publishing data about the machines that verifiably tried to spam us. The response was other than overwhelming, and with the script running once per day anyway, I now publish the results via the Name And Shame Robot page.

The annoucement below is very close to the text there, so by way of explanation, here is a gift to all my fellow spamd junkies out there:

We started actively greytrapping and publishing our list of greytrap addresses (almost exclusively addresses generated or made up elsewhere and harvested from our logs) during July 2007. The list of greytrap addresses is published on the Traplist page along with some commentary. You can find related comments in this blog post and its followups.

One byproduct of the greytrapping is a list of IP addresses that has tried to deliver mail to one or more of our greytrap addresses during the last 24 hours. The reasoning is, none of these addresses are valid, and any attempts at delivering to those addresses is more likely than not spam. You can download that list here as a raw list of IP addresses, or as a DNS zone file intended as a DNS blacklist here.

In early August 2008, I wrote a small script that copies (rsyncs, actually) the current list of trapped IP addresses as well as the spamd log off the firewall and for each IP address collects all log entries from the spamd log. The resulting file is rsynced back to the webserver, and you can view the latest version here.

The material here is useful mainly to the system administrators responsible for the machines that appear in it, or people who are interested in studying spammer or spambot behavior. Times are given according to the Europe/Oslo time zone (CET or CEST according to season), and if a date appears several times for an IP address entry, the reason is simply that the log data spans several years. The default syslog settings do not record the year.

In the data you will find several kinds of entries, most of them are pretty obvious and straightforward, others less so. The likely FAQ is, "what are the entries with no log data?". The answer is, the spamd here synchronizes with a spamds at other sites. The entries without log data entered our traplist through a sync operation, but the host did not attempt direct contact here.

The other likely question is, "what is that becks list?". It's what the rest of the world refers to as uatraps. I copied the data for that list into my config from Bob Beck's message on OpenBSD-misc and didn't notice that the list had an official name until much later.

Please note that this is not an up-to-the minute list. Depending on the number of hosts currently in the list of trapped addresses, the script's run time could be anything up to several hours. For that reason, the script starts at the time stated at the beginning of the report file and runs until it finishes generating. The last thing the script does is to rsync the report file to the webserver. For the time being, I archive older versions off-line.

This is now a totally hands-off, automated operation. The report is currently generated on a Pentium IV-class computer with few and only occasional other duties. If you have any comments or concerns, the address in the next sentence is the one I use for day to day email. If you find this data useful, donations of faster hardware or money (paypal to peter@bsdly.net or contact me for bank information) is of course welcome.

Thursday, August 7, 2008

Now that we have their addresses, do we name and shame?

The legal owners of botnet-controlled spam senders are quite likely unaware what their machines are doing. Do they deserve to be outed, named and shamed?

Earlier this week a friendly Australian who I think had been reading my blog sent me a few questions about spam, spammers and what to do with them. Would it for example be useful to forward the IP addresses in the local traplist to law enforcement? After all, I publish a dump of IP addresses from my local-greytrap once per hour, and apparently at least some people are fetching and using that as a valid blacklist on a regular basis.

(On a side note: if you do fetch that list regularly, keep in mind that the data is dumped ten past every hour, that's when the data is fresh. If you fetch at every full hour, the data is already fifty minutes old).

Anyway, my initial reaction to the question about forwarding the list of IP addresses to law enforcement was, along the lines of "Well, a raw list of IP addresses doesn't really add up to a lot of evidence, but if you can extract the log entries for each one, you may have something". My actual answer was phrased a little differently, but even while I was writing my reply I started fiddling with a script to read my list of trapped IP addresses and grep the spamd log for all entries for each IP address.

My complete collection of spamd logs goes back a few years, so searching for a complete history does take a while. (For techies: for each IP address, a grep of the entire log takes at least a few seconds (s) , total time is s is times number of entries (N), typically a few thousand, and grepping in parallel is difficult, because you want the output per IP address, not interlaced like in the raw log data).

After a while, you can see output roughly like this:


Host 81.183.80.187:
Aug 7 07:24:16 skapet spamd[13548]: 81.183.80.187: connected (12/9)
Aug 7 07:24:30 skapet spamd[13548]: (GREY) 81.183.80.187:
<akstcabrushwithafricamnsdgs@abrushwithafrica.com> -> <bennett-gauvin@ehtrib.org>
Aug 7 07:24:31 skapet spamd[13548]: 81.183.80.187: disconnected after 15 seconds.
Aug 7 07:24:44 skapet spamd[13548]: 81.183.80.187: connected (9/7)
Aug 7 07:25:06 skapet spamd[13548]: (GREY) 81.183.80.187:
<akstcamplepleasuremnsdgs@amplepleasure.net> -> <bennett-gauvin@ehtrib.org>
Aug 7 07:25:07 skapet spamd[13548]: 81.183.80.187: disconnected after 23 seconds.
Aug 7 07:25:08 skapet spamd[13548]: 81.183.80.187: connected (11/9)
Aug 7 07:25:23 skapet spamd[13548]: (GREY) 81.183.80.187:
<akstcaesamnsdgs@aesa.ch> -> <bennett-gauvin@ehtrib.org>
Aug 7 07:25:24 skapet spamd[13548]: 81.183.80.187: disconnected after 16 seconds.
Aug 7 07:26:16 skapet spamd[13548]: 81.183.80.187: connected (11/9), lists: spamd-greytrap
Aug 7 07:30:00 skapet spamd[13548]: (BLACK) 81.183.80.187:
-> <bennett-gauvin@ehtrib.org> -> <bennett-gauvin@ehtrib.org>
Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: From: "Frances Ballard"
-> <bennett-gauvin@ehtrib.org>
Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: To: <bennett-gauvin@ehtrib.org>
Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: Subject: Extraordinary Narcotic Deals
Aug 7 07:32:47 skapet spamd[13548]: 81.183.80.187: disconnected after 391 seconds.
lists: spamd-greytrap


That's rougly what I would have expected to see: A host tries to send obvious spam to one of the trap addresses (one I harvested from incoming noise earlier), is added to spamd-greytrap and on the next attempts gets stuck for a few minutes. (Notice that this spammer has another version of grepable From: addresses - prepend akstc and append mnsdgs to the basename so abrushwithafrica.com becomes the junk address akstcabrushwithafricamnsdgs@abrushwithafrica.com. Content and header filterers, please take note.) I thought that this would be the typical behavior, but browsing the output from my script, entries of this kind seems to be more of the norm:

Host 81.192.185.9:
Aug 6 12:47:15 skapet spamd[13548]: 81.192.185.9: connected (12/8)
Aug 6 12:47:27 skapet spamd[13548]: (GREY) 81.192.185.9:
<jacowen@teaneckschools.org> -> <hevadcouture@bsdly.net>
Aug 6 12:47:27 skapet spamd[13548]: 81.192.185.9: disconnected after 12 seconds.

Here, the spambot tries exactly once, never to return. It's possible they detect the stuttering (our side answers one byte per second for the first ten seconds) and give up for that reason, but it could equally well be that it's classic fire-and-forget, the reason why greylisting still works. Or both, for that matter.

But back to the real question: Now that we have the data, what do we do with it?

With the script I have now, extracting the history for each of several thousand IP addresses takes some hours. The output is enlightening, but by the time the run is complete, it could be significantly more than twenty-four hours since the machines listed tried to send spam.

Should we name and shame anyway? If we forward the data to law enforcement, would they care?

For the time being, I'll try to think of a quicker way to extract the data. Any input on how to make the process more efficient is welcome, as is considered (learned or otherwise) opinion on the ethical up- or downside of publishing spamd log data.

Wednesday, July 2, 2008

Is there really a market for an open source router?

Open source goodness. Coming soon to a router near you (if it isn't there already).

I have a confession to make. Today's headline isn't mine. I snatched it from Dana Blankenhorn's June 30th piece over at ZDNet. It almost made me utter a Simpsonian grunt and start ranting about my more than 40,000 visitors again. Maybe my readers don't constitute a market, and in a consumerland context, a mere forty thousand (they've been coming in at a rate of about five hundre new uniques a week for a little while now) is possibly small potatoes indeed.

On the other hand, there are good indications that significant parts of the Internet actually runs on open source in some form, regardless of sundry punditry or for that matter how many people have found my online or printed work. And then again, if even a small subset of those who have downloaded my work actually do some of the things I write about, there is reason to believe that they have achieved a degree of insulation between any local stupidity and the Internet at large.

But back to the ZDNet piece. The interesting news there is that Netgear are apparently coming around to support open source via the MyOpenRouter web site and at least one wireless router appliance with firmware source code available. It will be kind of interesting to see if they've actually made their code and specifications open enough that we have a reasonable chance of seeing non-Linux open source systems such as OpenBSD run on the platform.

If you read the OpenBSD source-changes list you probably know this already, but even if you don't, OpenBSD just turned -current into 4.4-beta (see Theo's commit message). I take that to mean that the various significant changes such as the overhaul of the PF code will be thoroughly tested in time for the 4.4 release. That change hasn't made it into snapshots just yet (but likely will witin the next few hours), but you can take a peek at the OpenBSD change log for a preview of the goodies that will be officially released on November 1st. I for one am looking forward to that date.

Wednesday, June 25, 2008

Yes, we can! Make a difference, that is

Good netizenship sometimes comes with a green tinge.

Taking in my daily Linuxtoday dose this morning, there was one item grabbed that my attention, with the headline "Botnets and You: Save the World--Install Linux", and the Linuxtoday entry in turn points to Ross Brunson's blog post with the same title. Do click the link to Ross' blog, it's well worth reading.

What I particularly like about the piece is that he makes the point that you can actually make a difference. More specifically, if you run Linux (being a Novellian, he naturally recommends SLES or SLED) and eliminate Microsoft from your system, you are not only gaining for yourself a safer and more reliable platform, you are also helping everybody else by making the probability of your machine ever joining a botnet a lot smaller.

As regular readers here will recognize, I rate being a good netizen (aka net citizen) as extremely important. Let others get on with their business while we tend to our own tasks, not interfering unless we really have to. If you opt to run your day to day business on the same software your machine most likely came with, the likelihood that somebody else will be taking control of your machine and using it for less than desirable purposes is in fact anything but negligible. I could have used stronger words ("reckless endangerment" comes to mind), but then Redmondians would have just shut off all remnants of rationality. I have argued earlier (article in Norwegian only, sorry) that a computer owner's responsibility should be roughly on par with a dog owner's, but it's possible I should return to that in a future column. And besides, any Linux I've touched for the last ten years is easier to install and operate than the Microsoft offering.

If you followed the Linuxtoday link earlier, you know that I could not resist making the suggestion there that it is in fact possible to be an even better netizen. As outlined in an earlier column (and its followups), if you do your greytrapping properly, you can keep the bad guys occupied and have fun at the same time, consuming next to no resources locally. How's that for green computing.

For example, the crew who started sending messages with headers like

From: "Mrs Maria Jose" <cordinator@euros-ukolympics.com>
Subject: Immediate Response Required.(Euro Award)

to various spamtrap addresses on May 15th are still patiently trying to deliver. A very superficial log analysis shows that there were originally four hosts sending those messages from the 217.70.178.0/24 network. There appears to be only one left now, but collectively these machines have so far made 476,787 attempts at delivery to my data collection points. Judging from a sample of some 21,000 connections from one of the hosts, the average connection time was 389.68 seconds, which in turn means that we've had those spam senders waste approximately 185792441 seconds, or time equal to 5.89 years.

Not bad in a little more than a month. On the downside, the predictions that spambots would sooner or later learn to do things in parallel have been proved true. My logs indicate that the current crop is able to handle at least sixty simultaneous delivery attempts. Even bogged down by a suboptimal operating system at the sending end, modern desktop computers are in fact powerful beasts. In my book it's just good netizenry to set up a machine to keep the garbage they send off your own network, and by extension off others since they don't get around to try delivering to others. By the way, that list is now almost 15,000 addresses long, all non-deliverable garbage. You could be excused for thinking it a twisted art project.

Tuesday, June 17, 2008

BSD Unix? That's purely historical

If you've ever bought something that ended up disappointing you to the point where you wanted to yell at somebody, you will recognize the frame of mind I was in after reading the book I ended up reviewing. With perfect hindsight I should of course have smelled the rat - anything written in this century about "BSD UNIX" is either a retrospective or ill informed. If you don't fancy a book review, tune in next time for something completely different instead.


Book Review: BSD Unix Toolbox

I came across this title while browsing an online bookstore for possible supporting literature for a course I'm planning. The teaser text boasts "1000+ Commands for FreeBSD, OpenBSD and NetBSD", and with a 2008 publication date I thought this one was definitely worth checking out.

At roughly 300 pages, covering 3-4 commands per page usefully would be a tall order for anyone, so I was a little surprised to find that the book sets aside two chapters to preliminaries, first "Starting with BSD systems" with a brief and not very complete overview of BSDish systems and some pointers to online resources, before moving on to an entire chapter on installing FreeBSD.

In a book that's supposedly about more than a thousand useful commands on FreeBSD, OpenBSD and NetBSD, setting aside an entire chapter to a rather superficial description of how to install one of the systems seems to me a very odd choice. Odder still, how to install either OpenBSD or NetBSD is not covered at all. Now installing any of those systems is not in fact too difficult, and I for one do not think the world needs yet another walkthrough of FreeBSD's or the other systems' install process. In my view, it would have been better if the authors had concentrated on getting around to describing those 1000+ commands, the sooner the better.

In the installation chapter, which also covers the ports and packages system, the authors seem to be unaware that each system has its own variety, and that on NetBSD the system goes under a different name. As the chapter title implies, this chapter is clearly FreeBSD specific, and would have been a lot more useful if the authors had noted at least some of the significant differences that exist between the systems the book sets out to cover.

Probably the most useful chapters in the book are chapters 3 through 6, where essentially all information is likely to be portable to all covered systems. However, Chapter 3, "Using the shell", is not without its oddities: The authors seem to assume that the user has installed Gnome as the preferred desktop environment and covers mainly using Bash as the shell. That is a slightly odd choice since to my knowledge Bash is not the default shell on any of the BSDs, but available as an optional extra through the package system.

Chapter 4, "Working with Files", walks through the basics of file types, file permissions, file system operations and commands such as cp, file, mount and a few others. After reading the chapter, you will be aware that these commands exist, if not much else.

Chapter 5, "Manipulating Text", offers the briefest treatment I've ever seen of regular expressions, mentions in passing vi and emacs as possible tools and then moves on to describing what appears to be the authors' favorite text editors (joe, nano and pico, neither of those are in the base system anywhere) and a brief mention of some graphical tools. The chapter then offers samples of using cat, head, tail, more, less, pr, grep, wc, sort, strings, sed, tr, diff, sdiff, awk, cut, od and finally unix2dos. Again, after reading the chapter you will be aware that the commands exist, but you will be looking elsewhere for detailed information.

Large chunks of Chapter 6, "Playing with Multimedia" would be useful on most unixlike systems (oggenc and convert likely perform much the same anywhere), but once again what little is offered as tips for getting sound or other functionality to work on your system is strictly FreeBSD-specific.

In Chapter 7, "Administering File Systems" the perspective is again distinctively FreeBSD-centric with little or no note of even potential differences across systems. For a FreeBSD user it may offer a useful if very brief and shallow walkthrough, though.

Chapter 8, "Backups and Removable Media", shows some examples of tar, gzip and rsync use, sometimes in combination, supplemented with brief mention of some common and less common file compression tools. The removable media section covers CD burning with cdrtools in more detail than most other software mentioned in this book, but fail to mention useful tidbits such as how to use OpenBSD's cdio command (which is in base) for similar tasks. In a book that claims to be up to date as of 2008, I find that a very curious omission.

Chapter 9, "Checking and Managing Running Processes" does little more than mention the names of some process management relevant commands such as nice, renice, fg, bg, kill and killall in passing before delving into a surprisingly detailed walkthrough of ps. It proceeds to describing top, pgrep, fuser (which the user is instructed to install via pkg_add), then returns to nice and renice and offering a couple of examples of fg and bg use before really picking up speed with kill and killall (fortunately with a list of signals), background processes started via either nohup or by appending an & character, and spending a couple of sentences each on at, gatch, atq, atrm and crontab.

Chapter 10, "Managing the System", weighs in at a little more than 20 pages, and characteristically slips back into FreeBSD-centric mode where it offers much detail at all. For some reason this is where the instructions about setting up your system for booting several operating systems turns up, along with a description of using GRUB as your boot loader.

Chapter 11, "Managing Network Connections" is again quite FreeBSD centric even if it does rattle off the names of the others at apparently random intervals. The information is as far as I can tell mostly correct for FreeBSD and some, if not all, commands will work elsewhere, but superficial enough that a user will have to turn elsewhere for help in resolving any problems that turn up.

Chapter 12, "Accessing Network Resources" covers anything from browsing the web (the authors state confidently that lynx 'has been supplanted [...] by the links browser, which was later replaced by elinks', apparently unaware that in OpenBSD at least, lynx is in fact part of the base system), fetching files with wget, curl, lftp (curiously recommending lftp even though in at least OpenBSD the base system's ftp client offers essentially all the required functionality), before spending four pages doing some handwaving about how to set up samba. IRC and mail clients are also mentioned, but I was a bit surprised that 'managing mail' apparently does not even touch on running a mail service.

Chapter 13, "Doing Remote System Administration", covers ssh, screen, tsclient (Gnome's windows remote desktop client), xhost, vnc and vino (another Gnome applet, this one for sharing your Gnome desktop) in that order, none of them in any great detail, bringing to mind the mantra 'now at least you know the commands exist'.

Chapter 14, "Locking Down Security", sprints through the basics of user and groups administration on FreeBSD, moves on to some tips about running services in general and via inetd, and also mentions firewalls.

The section "Configuring the Built-In Firewall" has me really baffled. The authors claim not only that ipfw has been ported to NetBSD and OpenBSD (where PF, mentioned here only as PacketFilter, has been the only packet filter since 2001), but the online reference they give (http://www.phildev.net/ipf) actually points to information about Darren Reed's IPFilter, also known as IPF.

It is unclear which firewall the authors think they have configured, but the actual rule sets they offer are ipfw scripts. It is likely that a user trying to run with the rc.conf snippet supplied and the rule set would in fact end up with both ipf and ipfw enabled, but likely with no working packet filtering (and depending on how the kernel with ipf and ipfw was compiled, the configuration would be either completely open or completely shut, another point apparently unknown or considered irrelevant by the authors). After a few examples of ipfw operations, the chapter then moves on to mention that yes, you can actually input your own information into the system logs, before recommending that you set up a centralized syslog server and instructing you to look into the third-party tools tripwire and chkrootkit.

The three appendixes have reference-style information (finally!) about using vi or vim, shell special characters and variables, and 'personal configuration files', aka dotfiles. All very brief, of course.

Unfortunately, this is a book I can not recommend. Large chunks of it or something very similar is available elsewhere, some of it for free and reasonably well written. If you're a FreeBSD user you will find yourself looking up the topics in the Handbook anyway, if you are a NetBSD or OpenBSD user, the relatively platform independent parts are addressed equally well in several 'Linux' books and other online resources.

There may in fact be more than a thousand monospace type 'command' examples in the book (I never counted), but other than that this title fails to live up to the expectations set up by the cover and other marketing. The book goes to some length to give the impression that it's current, with all dates I could see in examples set somewhere in the second half of 2008, but the authors appear to have been working with FreeBSD 6.3 as their current version.

The relatively frequent mention of 'BSD distributions' - a term that never entered the BSD vocabulary, mainly because the BSDs are maintained as separate systems - and various odd details such as the plainly wrong firewalls examples makes me suspect strongly that much of the material is in fact warmed over from a Linux book, slightly edited where the authors thought it was necessary. Unfortunately, they missed more than a few spots and for whatever reason the fact checking and testing did not get all the attention it should have. The result is a book that is very superficial where it's right and has enough spots that are anything from misleading to downright wrong that it gets rather irritating to an experienced reader. I can only imagine how frustrating it would be to use this as a resource for learning.

For a long time *BSD user (yes, all three), it seemed like a nice surprise that Wiley had discovered that there is such a thing as a BSD market. After a few hours looking at this entry, I hope they take the task a little more seriously the next time they try offering to sell us BSD literature.

Book info:

Title: BSD UNIX Toolbox: 1000+ Commands for FreeBSD, OpenBSD and NetBSD
Authors: Christopher Negus and Francois Caen
Publisher: Wiley Publishing, Inc. (Indianapolis)
Copyright: 2008
ISBN: 978-0-470-37603-4