The distributed but clearly coordinated bruteforcers are still at it. How long until they reach the end of the alphabet? And why are they staying away from my OpenBSD machines? Are we seeing the contours of a controlling intelligence?
As large parts of the Western world prepares for the holidays, the swarm of little robots that started trying to pry open the doors to my machines some weeks back are still at it. As far as we can tell, the coordinated attempts started some time in early November or perhaps late October (we don't keep logs around for long enough to be sure), with an alphabetic progression that has now progressed to somewere into the os. The complete listing from the time I started noticing up to the time I started writing this column can be found here.
I've written about this before, and in fact one of those columns was slashdotted, a pleasant surprise to me and a cause of some excitement among my colleagues at FreeCode.
After writing that article, I did some further research and found out that a precursor to what we are seeing now was observed as early as May 2008, as described in an Ars Technica article published at the time. That article also reveals, via Linuxtoday, that yours truly was among the many who failed to understand the problem, at least for a while. Then again, maybe actual log excerpts would have helped.
The problem, such as it is, seems to be that a somebody who herds a botnet has decided that the laws of big numbers favors those who keep trying for long enough. User names and passwords are generally far enough from random that if you are allowed to go on for long enough, you will sooner or later manage to guess a correct combination of username and password and get access to a machine somewhere.
Sysadmins have been seeing bruteforce attacks for years. The traditional brute force attack would be a rapid succession of login attempts from one host, and usable countermeasures were devised in short order. My favorite of course involves PF, and the description of how to thwart traditional bruteforcers is one of the more popular pages in my PF tutorial.
The distributed, slow bruteforcers are different. For one, the login attempts from each host out in the cloud are spaced far enough apart in time that intrusion attmpt detectors will not trigger. Next, it takes a keen eye to spot the common thread in the attempts spaced up to a number of minutes apart: a monotonously alphabetic progression of user names, with attempts coming in from different hosts. Some number of attemtps at a specific user name, before the cloud moves on the next one, in alphabetic order.
During the period we have been observing the slow brute activity, a total of 695 hosts have been involved. A total of 665 hosts made unsuccessful attempts at authenticating at the hosts we are observing during November, while the number for December so far is 346. The typical number of attempts per user name has decreased, too, from a typical ten do fifteen during the early days down to between one and four during the last couple of weeks.
I thought at first that the decrease in activity was just an indicator that compromised hosts were getting cleaned up, but my colleague Egil Möller was the first to suggest that since we know the attempts are coordinated, it is not too far fetched to assume that the controlling system measures the rates of success for each of the chosen targets and allocates resources accordingly.
If Egil's assumption is right, we are seeing the bad guys adapting. My systems do not run any services they do not need to, and apparently all attempts at gaining access have been futile so far. So, the controlling system shifts resources to elsewhere, even if the access attempts do not stop entirely. Come to think of it, I'm not seeing any attempts at all on my OpenBSD systems, so it is possible to speculate that whoever is behind this phenomenon has decided that OpenBSD systems are hardened enough to begin with and usually run by compentent paranoids as to be useless as targets. That would be a comforting thought at the end of a long and sometimes trying year.
Speaking of the new year, look for exciting announcements coming from FreeCode. We're working on some cool things. And with a bit of luck, I might run into you at one conference or the other during the coming year.
Happy holidays to everyone.
Note: A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.
Sunday, December 21, 2008
Saturday, December 6, 2008
A Small Update About The Slow Brutes
Slow and steady might actually do it, eventually.
The reactions to my December 2nd column hit me with a bit of surprise. The column was taken on by slashdot and Linux Today both, producing a largish number of page views, but only two clicks on my featured ads. But while my clickthrough rate is not particularly interesting to others, the comments to the columns sometimes are.
If you look at the comments at slashdot and elsewhere, most of the commenters most likely did not actually read the column in full or did not take the time to digest what it actually said, with some notable exceptions. And yes, there were others, some also wrote in via email with informed comment - thanks!
For the benefit of those who did not get the point the first time around, I'll try once more to explain what the observations are and what they may in fact mean.
A number of commenters offered well meant advice to use packages like fail2ban, denyhosts or a few others.
The common denominator for all of them is that they track single hosts that make a larger than usual number of connections or are the source of a number of failed logins higher than a certain threshold value over a set time period. I appreciate your concerns, but the subject of the column did not fit well with the way those the packages work.
In fact, a similar scheme was already in place at the site that provided the data. The machines that provided the ssh logs are FreeBSD ones (as the sharper ones have observed already, and the reasons may possibly be revealed over beer sometime), but any gateway under my control will run OpenBSD, and by extension, PF (and yes, there is a book you might want to order from one (North America) or the other (Europe and elsewhere) of the OpenBSD project's sites). For a quick fix of background, the online PF tutorial may be worth a look.
Anyway, the /etc/pf.conf at that site's gateway contains the lines
table <abusive_hosts> persist
block log quick from <abusive_hosts>
and
pass log (all) quick proto { tcp, udp } from any to any port ssh flags S/SA keep state \
(max-src-conn 15, max-src-conn-rate 7/3, overload <abusive_hosts> flush global)
Those lines provide a variation on the logic that those posters recommended. Essentially, any host that tries 15 or more simultaneous ssh connections, or come in at a rate of more than seven over the span of three seconds, will be added to the table <abusive_hosts>, and the block quick rule blocks any further access from those hosts. Yes, flush global means what you think it does.
This works at the network level. For a gateway with a potentially large number of hosts on either side, the success or otherwise of eventual authentication may not be relevant and may be better dealt with elsewhere. Anyway, at the time I started working on this column, the table <abusive_hosts> on the gateway contained only two hosts:
:~$ sudo pfctl -t abusive_hosts -v show
194.204.37.93
201.57.187.114
I keep offenders in that table for 24 hours only, I do not believe in the permanent bans that some commenters advocate. After all, there is such a thing as DHCP, and entire netblocks are reallocated with amazingly short intervals.
Anyway, looking at the authentication log on the gateway reveals how those hosts got added to the the table in the first place:
:~$ grep 194.204.37.93 /var/log/authlog
Dec 5 22:50:30 delilah sshd[15266]: Did not receive identification string from 194.204.37.93
Dec 5 22:50:37 delilah sshd[6106]: Did not receive identification string from 194.204.37.93
Dec 6 01:29:58 delilah sshd[30359]: Failed password for root from 194.204.37.93 port 47071 ssh2
Dec 6 01:29:58 delilah sshd[7293]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:29:59 delilah sshd[4395]: Failed password for root from 194.204.37.93 port 47296 ssh2
Dec 6 01:29:59 delilah sshd[27615]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:00 delilah sshd[12248]: Failed password for root from 194.204.37.93 port 47330 ssh2
Dec 6 01:30:00 delilah sshd[24579]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:01 delilah sshd[3434]: Failed password for root from 194.204.37.93 port 47380 ssh2
Dec 6 01:30:01 delilah sshd[32737]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:02 delilah sshd[11984]: Failed password for root from 194.204.37.93 port 47425 ssh2
Dec 6 01:30:02 delilah sshd[27059]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:03 delilah sshd[13345]: Failed password for root from 194.204.37.93 port 47459 ssh2
Dec 6 01:30:03 delilah sshd[1858]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:04 delilah sshd[12739]: Failed password for root from 194.204.37.93 port 47516 ssh2
Dec 6 01:30:04 delilah sshd[16843]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:05 delilah sshd[13796]: Failed password for root from 194.204.37.93 port 47564 ssh2
Dec 6 01:30:05 delilah sshd[16789]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:06 delilah sshd[628]: Failed password for root from 194.204.37.93 port 47602 ssh2
Dec 6 01:30:06 delilah sshd[6162]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:07 delilah sshd[2579]: Failed password for root from 194.204.37.93 port 47646 ssh2
Dec 6 01:30:07 delilah sshd[12461]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:08 delilah sshd[12725]: Failed password for root from 194.204.37.93 port 47685 ssh2
Dec 6 01:30:08 delilah sshd[29909]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[16560]: Failed password for root from 194.204.37.93 port 47724 ssh2
Dec 6 01:30:09 delilah sshd[1690]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[1600]: Failed password for root from 194.204.37.93 port 47771 ssh2
Dec 6 01:30:09 delilah sshd[28882]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:10 delilah sshd[29953]: Failed password for root from 194.204.37.93 port 47807 ssh2
Dec 6 01:30:10 delilah sshd[15349]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:11 delilah sshd[2962]: Failed password for root from 194.204.37.93 port 47845 ssh2
Dec 6 01:30:11 delilah sshd[27557]: Received disconnect from 194.204.37.93: 11: Bye Bye
and
:~$ grep 201.57.187.114 /var/log/authlog
Dec 5 23:55:30 delilah sshd[24338]: Did not receive identification string from 201.57.187.114
Dec 5 23:55:35 delilah sshd[23570]: Did not receive identification string from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Invalid user raimundo from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Failed password for invalid user raimundo from 201.57.187.114 port 35776 ssh2
Dec 5 23:59:58 delilah sshd[18515]: Received disconnect from 201.57.187.114: 11: Bye Bye
Dec 6 00:00:01 delilah sshd[17353]: Invalid user joan from 201.57.187.114
Dec 6 00:00:01 delilah sshd[17353]: Failed password for invalid user joan from 201.57.187.114 port 37570 ssh2
Dec 6 00:00:02 delilah sshd[30314]: Received disconnect from 201.57.187.114: 11: Bye Bye
which again shows that these were the old-fashioned, rapid-fire kind of bots. Looking a bit closer even reveals that they kept trying after they were put in the doghouse:
:~$ sudo pfctl -t abusive_hosts -vT show
194.204.37.93
Cleared: Sat Dec 6 01:30:11 2008
In/Block: [ Packets: 18985 Bytes: 835336 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]
201.57.187.114
Cleared: Sat Dec 6 00:00:02 2008
In/Block: [ Packets: 800 Bytes: 48268 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]
And of course there were no traces of those IP addresses or corresponding host names in the authentication logs in the machines where I have collected the data about slow bots. My best guess at why is that the gateway's IP address is at the low end of the routable range for that site. Note to bruteforcers: try working the Internet in reverse next time. (Not that it would help much here, but that's another story.)
The reason why I don't see much activity for other services is simply that those machines do not run all that many services, and only the services they actually run for the world's benefit are in fact available to the outside.
My log data shows a definite pattern, and the alphabetic progression points to a degree of coordination. The slow bots are, I theorize, operated by a botnet herder who has a large pool of compromised hosts available and who also believes that given enough time, sooner or later you will find a correct combination of user names and passwords for a given host. Statisticians tell me that assumption is in fact valid, at least to some extent.
By setting up the necessarily large number of attempts to come from a sizable number of hosts in either round-robin or pseudo-random order and intervals for each individual host's attempts in the several minutes to hours range, there is a very real possibility that the slow but determined campaign for control of any single system will drown in the noise.
It is useful to keep in mind that malware moved out of the hands of pranksters and vandals years ago. Mass destruction of systems or data might still make the occasional headline, but staying out of the limelight is likely to be a lot more profitable. Modern malware masters want their creations and charges to stay undetected. What we may be seeing right at this moment is that they have realized the herd may only be sustainable if they grow it slowly.
If you are interested in researching the phenomena I've blogged about, you're welcome to contact me directly for more information or raw data.
Note: A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.
The reactions to my December 2nd column hit me with a bit of surprise. The column was taken on by slashdot and Linux Today both, producing a largish number of page views, but only two clicks on my featured ads. But while my clickthrough rate is not particularly interesting to others, the comments to the columns sometimes are.
If you look at the comments at slashdot and elsewhere, most of the commenters most likely did not actually read the column in full or did not take the time to digest what it actually said, with some notable exceptions. And yes, there were others, some also wrote in via email with informed comment - thanks!
For the benefit of those who did not get the point the first time around, I'll try once more to explain what the observations are and what they may in fact mean.
A number of commenters offered well meant advice to use packages like fail2ban, denyhosts or a few others.
The common denominator for all of them is that they track single hosts that make a larger than usual number of connections or are the source of a number of failed logins higher than a certain threshold value over a set time period. I appreciate your concerns, but the subject of the column did not fit well with the way those the packages work.
In fact, a similar scheme was already in place at the site that provided the data. The machines that provided the ssh logs are FreeBSD ones (as the sharper ones have observed already, and the reasons may possibly be revealed over beer sometime), but any gateway under my control will run OpenBSD, and by extension, PF (and yes, there is a book you might want to order from one (North America) or the other (Europe and elsewhere) of the OpenBSD project's sites). For a quick fix of background, the online PF tutorial may be worth a look.
Anyway, the /etc/pf.conf at that site's gateway contains the lines
table <abusive_hosts> persist
block log quick from <abusive_hosts>
and
pass log (all) quick proto { tcp, udp } from any to any port ssh flags S/SA keep state \
(max-src-conn 15, max-src-conn-rate 7/3, overload <abusive_hosts> flush global)
Those lines provide a variation on the logic that those posters recommended. Essentially, any host that tries 15 or more simultaneous ssh connections, or come in at a rate of more than seven over the span of three seconds, will be added to the table <abusive_hosts>, and the block quick rule blocks any further access from those hosts. Yes, flush global means what you think it does.
This works at the network level. For a gateway with a potentially large number of hosts on either side, the success or otherwise of eventual authentication may not be relevant and may be better dealt with elsewhere. Anyway, at the time I started working on this column, the table <abusive_hosts> on the gateway contained only two hosts:
:~$ sudo pfctl -t abusive_hosts -v show
194.204.37.93
201.57.187.114
I keep offenders in that table for 24 hours only, I do not believe in the permanent bans that some commenters advocate. After all, there is such a thing as DHCP, and entire netblocks are reallocated with amazingly short intervals.
Anyway, looking at the authentication log on the gateway reveals how those hosts got added to the the table in the first place:
:~$ grep 194.204.37.93 /var/log/authlog
Dec 5 22:50:30 delilah sshd[15266]: Did not receive identification string from 194.204.37.93
Dec 5 22:50:37 delilah sshd[6106]: Did not receive identification string from 194.204.37.93
Dec 6 01:29:58 delilah sshd[30359]: Failed password for root from 194.204.37.93 port 47071 ssh2
Dec 6 01:29:58 delilah sshd[7293]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:29:59 delilah sshd[4395]: Failed password for root from 194.204.37.93 port 47296 ssh2
Dec 6 01:29:59 delilah sshd[27615]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:00 delilah sshd[12248]: Failed password for root from 194.204.37.93 port 47330 ssh2
Dec 6 01:30:00 delilah sshd[24579]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:01 delilah sshd[3434]: Failed password for root from 194.204.37.93 port 47380 ssh2
Dec 6 01:30:01 delilah sshd[32737]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:02 delilah sshd[11984]: Failed password for root from 194.204.37.93 port 47425 ssh2
Dec 6 01:30:02 delilah sshd[27059]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:03 delilah sshd[13345]: Failed password for root from 194.204.37.93 port 47459 ssh2
Dec 6 01:30:03 delilah sshd[1858]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:04 delilah sshd[12739]: Failed password for root from 194.204.37.93 port 47516 ssh2
Dec 6 01:30:04 delilah sshd[16843]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:05 delilah sshd[13796]: Failed password for root from 194.204.37.93 port 47564 ssh2
Dec 6 01:30:05 delilah sshd[16789]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:06 delilah sshd[628]: Failed password for root from 194.204.37.93 port 47602 ssh2
Dec 6 01:30:06 delilah sshd[6162]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:07 delilah sshd[2579]: Failed password for root from 194.204.37.93 port 47646 ssh2
Dec 6 01:30:07 delilah sshd[12461]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:08 delilah sshd[12725]: Failed password for root from 194.204.37.93 port 47685 ssh2
Dec 6 01:30:08 delilah sshd[29909]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[16560]: Failed password for root from 194.204.37.93 port 47724 ssh2
Dec 6 01:30:09 delilah sshd[1690]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:09 delilah sshd[1600]: Failed password for root from 194.204.37.93 port 47771 ssh2
Dec 6 01:30:09 delilah sshd[28882]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:10 delilah sshd[29953]: Failed password for root from 194.204.37.93 port 47807 ssh2
Dec 6 01:30:10 delilah sshd[15349]: Received disconnect from 194.204.37.93: 11: Bye Bye
Dec 6 01:30:11 delilah sshd[2962]: Failed password for root from 194.204.37.93 port 47845 ssh2
Dec 6 01:30:11 delilah sshd[27557]: Received disconnect from 194.204.37.93: 11: Bye Bye
and
:~$ grep 201.57.187.114 /var/log/authlog
Dec 5 23:55:30 delilah sshd[24338]: Did not receive identification string from 201.57.187.114
Dec 5 23:55:35 delilah sshd[23570]: Did not receive identification string from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Invalid user raimundo from 201.57.187.114
Dec 5 23:59:58 delilah sshd[10216]: Failed password for invalid user raimundo from 201.57.187.114 port 35776 ssh2
Dec 5 23:59:58 delilah sshd[18515]: Received disconnect from 201.57.187.114: 11: Bye Bye
Dec 6 00:00:01 delilah sshd[17353]: Invalid user joan from 201.57.187.114
Dec 6 00:00:01 delilah sshd[17353]: Failed password for invalid user joan from 201.57.187.114 port 37570 ssh2
Dec 6 00:00:02 delilah sshd[30314]: Received disconnect from 201.57.187.114: 11: Bye Bye
which again shows that these were the old-fashioned, rapid-fire kind of bots. Looking a bit closer even reveals that they kept trying after they were put in the doghouse:
:~$ sudo pfctl -t abusive_hosts -vT show
194.204.37.93
Cleared: Sat Dec 6 01:30:11 2008
In/Block: [ Packets: 18985 Bytes: 835336 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]
201.57.187.114
Cleared: Sat Dec 6 00:00:02 2008
In/Block: [ Packets: 800 Bytes: 48268 ]
In/Pass: [ Packets: 0 Bytes: 0 ]
Out/Block: [ Packets: 0 Bytes: 0 ]
Out/Pass: [ Packets: 0 Bytes: 0 ]
And of course there were no traces of those IP addresses or corresponding host names in the authentication logs in the machines where I have collected the data about slow bots. My best guess at why is that the gateway's IP address is at the low end of the routable range for that site. Note to bruteforcers: try working the Internet in reverse next time. (Not that it would help much here, but that's another story.)
The reason why I don't see much activity for other services is simply that those machines do not run all that many services, and only the services they actually run for the world's benefit are in fact available to the outside.
My log data shows a definite pattern, and the alphabetic progression points to a degree of coordination. The slow bots are, I theorize, operated by a botnet herder who has a large pool of compromised hosts available and who also believes that given enough time, sooner or later you will find a correct combination of user names and passwords for a given host. Statisticians tell me that assumption is in fact valid, at least to some extent.
By setting up the necessarily large number of attempts to come from a sizable number of hosts in either round-robin or pseudo-random order and intervals for each individual host's attempts in the several minutes to hours range, there is a very real possibility that the slow but determined campaign for control of any single system will drown in the noise.
It is useful to keep in mind that malware moved out of the hands of pranksters and vandals years ago. Mass destruction of systems or data might still make the occasional headline, but staying out of the limelight is likely to be a lot more profitable. Modern malware masters want their creations and charges to stay undetected. What we may be seeing right at this moment is that they have realized the herd may only be sustainable if they grow it slowly.
If you are interested in researching the phenomena I've blogged about, you're welcome to contact me directly for more information or raw data.
Note: A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.
Tuesday, December 2, 2008
A low intensity, distributed bruteforce attempt
We have seen the future of botnets, and it is a distributed, low-key affair. Are sites running free software finally becoming malware targets?
Note: This piece describes illegal activity I detected in 2008, targeting SSH servers. Later pieces in this series would hint at the existence of a specific piece of Linux malware, which I had not identified at the time this piece was written.
During the last few weeks, I noticed an anomaly in the authentication logs on one of my listening posts. There were a larger than usual number of ssh login attempts overall, a higher than usual number of attempts for non-existent user names as well as some failures for a few that actually exist as well.
Looking at the log directly a typical progression would look like this:
Nov 19 15:04:22 rosalita sshd[40232]: error: PAM: authentication error for illegal user alias from s514.nxs.nl
Nov 19 15:07:32 rosalita sshd[40239]: error: PAM: authentication error for illegal user alias from c90678d3.static.spo.virtua.com.br
Nov 19 15:10:20 rosalita sshd[40247]: error: PAM: authentication error for illegal user alias from 207-47-162-126.prna.static.sasknet.sk.ca
Nov 19 15:13:46 rosalita sshd[40268]: error: PAM: authentication error for illegal user alias from 125-236-218-109.adsl.xtra.co.nz
Nov 19 15:16:29 rosalita sshd[40275]: error: PAM: authentication error for illegal user alias from 200.93.147.114
Nov 19 15:19:12 rosalita sshd[40279]: error: PAM: authentication error for illegal user alias from 62.225.15.82
Nov 19 15:22:29 rosalita sshd[40298]: error: PAM: authentication error for illegal user alias from 121.33.199.39
Nov 19 15:25:14 rosalita sshd[40305]: error: PAM: authentication error for illegal user alias from 130.red-80-37-213.staticip.rima-tde.net
Nov 19 15:28:23 rosalita sshd[40309]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:31:17 rosalita sshd[40316]: error: PAM: authentication error for illegal user alias from gate-dialog-simet.jgora.dialog.net.pl
Nov 19 15:34:18 rosalita sshd[40334]: error: PAM: authentication error for illegal user alias from 80.51.31.84
Nov 19 15:37:23 rosalita sshd[40342]: error: PAM: authentication error for illegal user alias from 82.207.104.34
Nov 19 15:40:20 rosalita sshd[40350]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:43:39 rosalita sshd[40354]: error: PAM: authentication error for illegal user alias from 200.20.187.222
Nov 19 15:46:41 rosalita sshd[40374]: error: PAM: authentication error for illegal user amanda from 58.196.4.2
Nov 19 15:49:31 rosalita sshd[40378]: error: PAM: authentication error for illegal user amanda from host116-164.dissent.birch.net
Nov 19 15:55:47 rosalita sshd[40408]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 15:59:08 rosalita sshd[40412]: error: PAM: authentication error for illegal user amanda from static-71-166-159-177.washdc.east.verizon.net
Nov 19 16:02:06 rosalita sshd[40455]: error: PAM: authentication error for illegal user amanda from host87-163-static.30-87-b.business.telecomitalia.it
Nov 19 16:05:08 rosalita sshd[40459]: error: PAM: authentication error for illegal user amanda from 213.150.184.70
Nov 19 16:08:16 rosalita sshd[40465]: error: PAM: authentication error for illegal user amanda from mail.pddsl.de
Nov 19 16:11:24 rosalita sshd[40486]: error: PAM: authentication error for illegal user amanda from abu66.internetdsl.tpnet.pl
Nov 19 16:15:00 rosalita sshd[40491]: error: PAM: authentication error for illegal user amanda from 125.77.106.246
Nov 19 16:17:43 rosalita sshd[40497]: error: PAM: authentication error for illegal user amanda from 217.76.34.230
Nov 19 16:20:54 rosalita sshd[40506]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 16:24:09 rosalita sshd[40529]: error: PAM: authentication error for illegal user amanda from p578b4f0b.dip0.t-ipconnect.de
Nov 19 16:28:11 rosalita sshd[40538]: error: PAM: authentication error for illegal user amanda from mail.carena-ci.com
Nov 19 16:30:15 rosalita sshd[40551]: error: PAM: authentication error for illegal user amavis from 87.229.3.89
Nov 19 16:34:31 rosalita sshd[40567]: error: PAM: authentication error for illegal user amavis from 218.248.79.251
Nov 19 16:36:40 rosalita sshd[40574]: error: PAM: authentication error for illegal user amavis from 83-103-70-170.ip.fastwebnet.it
Nov 19 16:40:05 rosalita sshd[40596]: error: PAM: authentication error for illegal user amavis from 75-49-251-71.lightspeed.snjsca.sbcglobal.net
- and so on, with a striking regularity. See for example the attempts to log on as the alias user, 14 attempts are made from 13 different hosts, with only 70-46-140-187.orl.fdn.com trying more than once. Then thirteen attempts are made for the amanda user, from 13 other hosts. The pattern repeats again for users amavis, apache, at, and goes on with others, apparently trying users in an alphabetic sequence.
Phase 2: Not your run of the mill screwup, the data say
Repeated login attempts for non-existing users are nothing new (in fact the bruteforce avoidance section is one of the more popular parts of the PF tutorial), but I was a bit surprised to see the attempts actually reaching this machine, which is on a local network behind a PF gateway with a configuration that is in fact closely related to the one in the tutorial (and the book for that matter). Then looking at the log entries, I noticed a few more things: The attempts are never less than a minute apart, and the attempts from a single host are separated by much longer intervals. The full data set I extracted from the point I started noticing those anomalies sum up to these figures can be found here, in case you want to look at it and draw you own conclusions
Some one-liners give us illustrative numbers:
peter@thingy:~$ wc -l slowbrutes.txt
16727 slowbrutes.txt
That is, over this period there were 16727 failed ssh login attempts at this host. A large number for this particular machine, but not enough to raise eyebrows by itself at larger or busier sites.
More than sixteen thousand attempts, but for how many invalid user names?
peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
2962
peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $15}' | sort -u | wc -l
671
That is, approaching three thousand unlucky guesses, coming from 671 different hosts.
How many valid user names did they stumble upon?
peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $11}' | sort -u | wc -l
2
A grand total of two, one of them the rather obvious root, for a total of
peter@thingy:~$ grep -vc illegal slowbrutes.txt
1698
1698 attempts, coming from
peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
566
566 different hosts.
The patterns that emerge from the data, with the alphabetical ordering and apparent coordination, point to a botnet herder trying out new methods. Intrusion detection systems and adaptive firewalls are generally tuned to detecting things like large numbers of simultaneous connecions or a high rate of new connections from a host. Distributing the task of bruteforcing passwords to several hosts could seem like an inspired way to come in under the radar wherever relatively smart systems are in place. Setting the herd to attempt at a low frequency would likely mean that those failed attempts simply drown in the noise at higher volume sites, and will not be noticed.
Phase 3: Are you one of their guinea pigs, too?
There are indications that the method has not been quite perfected yet. At the start of this run, the bots would make at least ten attempts before moving on down the alphabet. Now it seems enough bots have been taken out of circulation that the typical number of attempts per user name is closer to three, with some tried only once:
Dec 2 11:45:59 rosalita sshd[55775]: error: PAM: authentication error for illegal user heaven from cpe001217e403b3-cm000f9fa6157c.cpe.net.cable.rogers.com
Dec 2 11:48:16 rosalita sshd[55778]: error: PAM: authentication error for illegal user heaven from 90.190.96.46
Dec 2 11:50:39 rosalita sshd[55791]: error: PAM: authentication error for illegal user heaven from static-71-117-126-102.snloca.dsl-w.verizon.net
Dec 2 11:55:26 rosalita sshd[55811]: error: PAM: authentication error for illegal user heavynne from dsl-217-155-184-54.zen.co.uk
Dec 2 11:57:57 rosalita sshd[55814]: error: PAM: authentication error for illegal user heavynne from pd907ee1e.dip0.t-ipconnect.de
Dec 2 12:00:20 rosalita sshd[55836]: error: PAM: authentication error for illegal user heba from 201-26-172-213.dial-up.telesp.net.br
Dec 2 12:07:37 rosalita sshd[55879]: error: PAM: authentication error for illegal user hector from 75.145.16.83
Dec 2 12:09:58 rosalita sshd[55882]: error: PAM: authentication error for illegal user hector from ppp-69-217-30-214.dsl.applwi.ameritech.net
Dec 2 12:12:33 rosalita sshd[55901]: error: PAM: authentication error for illegal user hector from 75-49-251-71.lightspeed.snjsca.sbcglobal.net
Dec 2 12:14:51 rosalita sshd[55905]: error: PAM: authentication error for illegal user hedda from 201.218.231.142
Dec 2 12:17:21 rosalita sshd[55911]: error: PAM: authentication error for illegal user hedda from 75.147.27.85
Dec 2 12:19:48 rosalita sshd[55914]: error: PAM: authentication error for illegal user hedda from 203.70.179.113
From where I'm sitting it's hard to tell whether the lower number of attempts means that the machines have cleaned up by their legal owners or whether they have simply taken out of rotation by their herders. Even with the initial 14 attempts per user name the chance of actually finding a valid combination of user names and passwords would be slim but not non-existent, but decreasing the number of attempts per time unit will necessarily make the chance of eventually finding a valid pair even smaller.
Apparently I'm not the only one seeing the slow brutes, as this post to openbsd-misc indicates. The sensible countermeasure could be to disallow shh password logins and allow only key logins, probably easier to set up and enforce than network-level measures. With the slow rate of attempts and the relatively large number of hosts involved, the undesirable traffic here is relatively hard to distinguish automatically from innocent errors unless you make have any attempt to log in with an invalid user name a sufficent reason for blocking traffic from that host.
Phase N: The shape of things to come
In the longer term view, this may very well be the shape of botnets to come. With a large enough pool of compromised hosts under their control, future botnet herders can afford to organize their activity so any one host only participates in undesirable activity at intervals long enough that malware detectors do not trigger (and thinking further ahead, if the world ever does go IPv6 wholesale and you can expect any one network interface to have dozens of IP addresses, think again how much more interesting detecting botnets becomes).
Antiware vendors will likely put their spin on this too when their marketing departments start noticing columns (Hey! It's Linux they're targeting!), but then as regular readers know, the more productive approach is always to reduce malware masters' target area by using systems that are less vulnerable because they have been extensively audited and whose makers are unafraid to make source code available for public inspection and experimentation.
As people who ran into me at the recent PF tutorial in London or at OpenCON will already know, have I joined FreeCode, the Norwegian free software consultancy. We expect to keep doing fun and useful things with free software for customers and friends, and some of it may be interesting enough to be the topics of future columns.
Note: A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.
NOTE: A version without trackers but “classical” formatting is available here.
Phase 1: “That's odd …”During the last few weeks, I noticed an anomaly in the authentication logs on one of my listening posts. There were a larger than usual number of ssh login attempts overall, a higher than usual number of attempts for non-existent user names as well as some failures for a few that actually exist as well.
Looking at the log directly a typical progression would look like this:
Nov 19 15:04:22 rosalita sshd[40232]: error: PAM: authentication error for illegal user alias from s514.nxs.nl
Nov 19 15:07:32 rosalita sshd[40239]: error: PAM: authentication error for illegal user alias from c90678d3.static.spo.virtua.com.br
Nov 19 15:10:20 rosalita sshd[40247]: error: PAM: authentication error for illegal user alias from 207-47-162-126.prna.static.sasknet.sk.ca
Nov 19 15:13:46 rosalita sshd[40268]: error: PAM: authentication error for illegal user alias from 125-236-218-109.adsl.xtra.co.nz
Nov 19 15:16:29 rosalita sshd[40275]: error: PAM: authentication error for illegal user alias from 200.93.147.114
Nov 19 15:19:12 rosalita sshd[40279]: error: PAM: authentication error for illegal user alias from 62.225.15.82
Nov 19 15:22:29 rosalita sshd[40298]: error: PAM: authentication error for illegal user alias from 121.33.199.39
Nov 19 15:25:14 rosalita sshd[40305]: error: PAM: authentication error for illegal user alias from 130.red-80-37-213.staticip.rima-tde.net
Nov 19 15:28:23 rosalita sshd[40309]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:31:17 rosalita sshd[40316]: error: PAM: authentication error for illegal user alias from gate-dialog-simet.jgora.dialog.net.pl
Nov 19 15:34:18 rosalita sshd[40334]: error: PAM: authentication error for illegal user alias from 80.51.31.84
Nov 19 15:37:23 rosalita sshd[40342]: error: PAM: authentication error for illegal user alias from 82.207.104.34
Nov 19 15:40:20 rosalita sshd[40350]: error: PAM: authentication error for illegal user alias from 70-46-140-187.orl.fdn.com
Nov 19 15:43:39 rosalita sshd[40354]: error: PAM: authentication error for illegal user alias from 200.20.187.222
Nov 19 15:46:41 rosalita sshd[40374]: error: PAM: authentication error for illegal user amanda from 58.196.4.2
Nov 19 15:49:31 rosalita sshd[40378]: error: PAM: authentication error for illegal user amanda from host116-164.dissent.birch.net
Nov 19 15:55:47 rosalita sshd[40408]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 15:59:08 rosalita sshd[40412]: error: PAM: authentication error for illegal user amanda from static-71-166-159-177.washdc.east.verizon.net
Nov 19 16:02:06 rosalita sshd[40455]: error: PAM: authentication error for illegal user amanda from host87-163-static.30-87-b.business.telecomitalia.it
Nov 19 16:05:08 rosalita sshd[40459]: error: PAM: authentication error for illegal user amanda from 213.150.184.70
Nov 19 16:08:16 rosalita sshd[40465]: error: PAM: authentication error for illegal user amanda from mail.pddsl.de
Nov 19 16:11:24 rosalita sshd[40486]: error: PAM: authentication error for illegal user amanda from abu66.internetdsl.tpnet.pl
Nov 19 16:15:00 rosalita sshd[40491]: error: PAM: authentication error for illegal user amanda from 125.77.106.246
Nov 19 16:17:43 rosalita sshd[40497]: error: PAM: authentication error for illegal user amanda from 217.76.34.230
Nov 19 16:20:54 rosalita sshd[40506]: error: PAM: authentication error for illegal user amanda from robert71.lnk.telstra.net
Nov 19 16:24:09 rosalita sshd[40529]: error: PAM: authentication error for illegal user amanda from p578b4f0b.dip0.t-ipconnect.de
Nov 19 16:28:11 rosalita sshd[40538]: error: PAM: authentication error for illegal user amanda from mail.carena-ci.com
Nov 19 16:30:15 rosalita sshd[40551]: error: PAM: authentication error for illegal user amavis from 87.229.3.89
Nov 19 16:34:31 rosalita sshd[40567]: error: PAM: authentication error for illegal user amavis from 218.248.79.251
Nov 19 16:36:40 rosalita sshd[40574]: error: PAM: authentication error for illegal user amavis from 83-103-70-170.ip.fastwebnet.it
Nov 19 16:40:05 rosalita sshd[40596]: error: PAM: authentication error for illegal user amavis from 75-49-251-71.lightspeed.snjsca.sbcglobal.net
- and so on, with a striking regularity. See for example the attempts to log on as the alias user, 14 attempts are made from 13 different hosts, with only 70-46-140-187.orl.fdn.com trying more than once. Then thirteen attempts are made for the amanda user, from 13 other hosts. The pattern repeats again for users amavis, apache, at, and goes on with others, apparently trying users in an alphabetic sequence.
Phase 2: Not your run of the mill screwup, the data say
Repeated login attempts for non-existing users are nothing new (in fact the bruteforce avoidance section is one of the more popular parts of the PF tutorial), but I was a bit surprised to see the attempts actually reaching this machine, which is on a local network behind a PF gateway with a configuration that is in fact closely related to the one in the tutorial (and the book for that matter). Then looking at the log entries, I noticed a few more things: The attempts are never less than a minute apart, and the attempts from a single host are separated by much longer intervals. The full data set I extracted from the point I started noticing those anomalies sum up to these figures can be found here, in case you want to look at it and draw you own conclusions
Some one-liners give us illustrative numbers:
peter@thingy:~$ wc -l slowbrutes.txt
16727 slowbrutes.txt
That is, over this period there were 16727 failed ssh login attempts at this host. A large number for this particular machine, but not enough to raise eyebrows by itself at larger or busier sites.
More than sixteen thousand attempts, but for how many invalid user names?
peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
2962
peter@thingy:~$ grep illegal slowbrutes.txt | awk '{print $15}' | sort -u | wc -l
671
That is, approaching three thousand unlucky guesses, coming from 671 different hosts.
How many valid user names did they stumble upon?
peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $11}' | sort -u | wc -l
2
A grand total of two, one of them the rather obvious root, for a total of
peter@thingy:~$ grep -vc illegal slowbrutes.txt
1698
1698 attempts, coming from
peter@thingy:~$ grep -v illegal slowbrutes.txt | awk '{print $13}' | sort -u | wc -l
566
566 different hosts.
The patterns that emerge from the data, with the alphabetical ordering and apparent coordination, point to a botnet herder trying out new methods. Intrusion detection systems and adaptive firewalls are generally tuned to detecting things like large numbers of simultaneous connecions or a high rate of new connections from a host. Distributing the task of bruteforcing passwords to several hosts could seem like an inspired way to come in under the radar wherever relatively smart systems are in place. Setting the herd to attempt at a low frequency would likely mean that those failed attempts simply drown in the noise at higher volume sites, and will not be noticed.
Phase 3: Are you one of their guinea pigs, too?
There are indications that the method has not been quite perfected yet. At the start of this run, the bots would make at least ten attempts before moving on down the alphabet. Now it seems enough bots have been taken out of circulation that the typical number of attempts per user name is closer to three, with some tried only once:
Dec 2 11:45:59 rosalita sshd[55775]: error: PAM: authentication error for illegal user heaven from cpe001217e403b3-cm000f9fa6157c.cpe.net.cable.rogers.com
Dec 2 11:48:16 rosalita sshd[55778]: error: PAM: authentication error for illegal user heaven from 90.190.96.46
Dec 2 11:50:39 rosalita sshd[55791]: error: PAM: authentication error for illegal user heaven from static-71-117-126-102.snloca.dsl-w.verizon.net
Dec 2 11:55:26 rosalita sshd[55811]: error: PAM: authentication error for illegal user heavynne from dsl-217-155-184-54.zen.co.uk
Dec 2 11:57:57 rosalita sshd[55814]: error: PAM: authentication error for illegal user heavynne from pd907ee1e.dip0.t-ipconnect.de
Dec 2 12:00:20 rosalita sshd[55836]: error: PAM: authentication error for illegal user heba from 201-26-172-213.dial-up.telesp.net.br
Dec 2 12:07:37 rosalita sshd[55879]: error: PAM: authentication error for illegal user hector from 75.145.16.83
Dec 2 12:09:58 rosalita sshd[55882]: error: PAM: authentication error for illegal user hector from ppp-69-217-30-214.dsl.applwi.ameritech.net
Dec 2 12:12:33 rosalita sshd[55901]: error: PAM: authentication error for illegal user hector from 75-49-251-71.lightspeed.snjsca.sbcglobal.net
Dec 2 12:14:51 rosalita sshd[55905]: error: PAM: authentication error for illegal user hedda from 201.218.231.142
Dec 2 12:17:21 rosalita sshd[55911]: error: PAM: authentication error for illegal user hedda from 75.147.27.85
Dec 2 12:19:48 rosalita sshd[55914]: error: PAM: authentication error for illegal user hedda from 203.70.179.113
From where I'm sitting it's hard to tell whether the lower number of attempts means that the machines have cleaned up by their legal owners or whether they have simply taken out of rotation by their herders. Even with the initial 14 attempts per user name the chance of actually finding a valid combination of user names and passwords would be slim but not non-existent, but decreasing the number of attempts per time unit will necessarily make the chance of eventually finding a valid pair even smaller.
Apparently I'm not the only one seeing the slow brutes, as this post to openbsd-misc indicates. The sensible countermeasure could be to disallow shh password logins and allow only key logins, probably easier to set up and enforce than network-level measures. With the slow rate of attempts and the relatively large number of hosts involved, the undesirable traffic here is relatively hard to distinguish automatically from innocent errors unless you make have any attempt to log in with an invalid user name a sufficent reason for blocking traffic from that host.
Phase N: The shape of things to come
In the longer term view, this may very well be the shape of botnets to come. With a large enough pool of compromised hosts under their control, future botnet herders can afford to organize their activity so any one host only participates in undesirable activity at intervals long enough that malware detectors do not trigger (and thinking further ahead, if the world ever does go IPv6 wholesale and you can expect any one network interface to have dozens of IP addresses, think again how much more interesting detecting botnets becomes).
Antiware vendors will likely put their spin on this too when their marketing departments start noticing columns (Hey! It's Linux they're targeting!), but then as regular readers know, the more productive approach is always to reduce malware masters' target area by using systems that are less vulnerable because they have been extensively audited and whose makers are unafraid to make source code available for public inspection and experimentation.
As people who ran into me at the recent PF tutorial in London or at OpenCON will already know, have I joined FreeCode, the Norwegian free software consultancy. We expect to keep doing fun and useful things with free software for customers and friends, and some of it may be interesting enough to be the topics of future columns.
Note: A Better Data Source Is Available
Update 2013-06-09: For a faster and more convenient way to download the data referenced here, please see my BSDCan 2013 presentation The Hail Mary Cloud And The Lessons Learned which summarizes this series of articles and provides links to all the data. The links in the presentation point to a copy stored at NUUG's server, which connects to the world through a significantly fatter pipe than BSDly.net has.
Monday, October 20, 2008
IETF failed to account for greylisting
The potential for conflict between greylisting and sites with large pools of outgoing SMTP senders is well known and in need of resolution. Why does the SMTP RFC moving along the standards track fail to address this?
Standardization efforts rarely grab headlines. Except in rather exceptional circumstances (think Microsoft's recent ISO buyout), standards are formulated in response to specific technical needs, or as frequently happens, standards documents are written to codify existing and well known best practices.
Note: This piece is also available without trackers but classic formatting only here.
As regular readers of this column will be aware, I tend to argue that in the email domain, greylisting is one such 'best practice' that would deserve to be included in a standards or best practice document. In a way it already is, but it was never actually referred to in any standard or standards draft, and its main claim to standards compliance relied on extrapolating some crucial points in RFC2821.
The technical alibi for claiming that greylisting is a valid, RFC-compliant technique comes from reading section 4.5.4.1, Sending Strategy. It's really quite straightforward: An SMTP sender that receives a temporary error (451) when it tries to deliver is required to try again later, after a reasonable time. The RFC gives a few more details such as recommended retry intervals and time to give up trying, but does not explicitly say where those repeated attempts should come from. At the time RFC2821 was written in early 2001, it was almost certainly implicit in the formulation used that the retries would come from the same hosts, and specifying that as an explicit requirement most likely seemed more than a little redundant.
The MUST requirement in RFC2821 is what cleared the way for greylisting (see greylisting.org and the relevant parts of my PF tutorial (or the book)), and the earliest implementations started appearing from 2003 onwards.
Fast-forward a few years, and you have a situation where several large operators have set up their networks with large pools of outgoing SMTP servers and no guarantee which machine in the pool will be the next to handle a message queued for a delivery retry. Some greylisting implementations use a modified algorithm that stores or at least acts upon the subnet a greylistes message came from instead of the specific IP address, while others such as OpenBSD's spamd operate strictly on individual IP addresses.
It does not exactly take a rocket scientist to figure out that here is a potential problem, at least when it comes to the stricter implementations. Sites that do not retry from the same IP address can claim to be RFC2821 compliant, since the RFC does not contain a specific requirement that the delivery retries have to be from the same IP address. Again quoting my tutorial, the solution so far has been to whitelist those sites, extracting their SPF info for whitelisting purposes if necessary.
The large number of outgoing SMTP hosts per site problem has been widely discussed, and anybody even marginally interested in mail and spam avoidance should be aware if this. Then here's a surprise for you: Apparently the writers of RFCs are actually unaware of this, or chose not to care. On October 1, 2008, I found in my IETF mailbox RFC5321, which obsoletes RFC2821. The new RFC contains a number of things, but section 4.5.4.1, "Sending Strategy" (yes, they even kept the numbering) is unchanged.
That means that the working group were either unaware of the problem or chose not to resolve the conflict at this time. It would have been a very sensible thing to explicitly state that retries MUST come from the same IP address. The only halfway sane reason not to resolve the conflict that way would be that this would have made greylisting powerful enough to possibly go a long way towards eliminating the need for SPF and other more convoluted schemes.
I can not bring myself to believe that the working group does not have at least one member who is aware of what greylisting is and knows about that sole remaining problem that needs to be addressed. RFC5321 has almost everything else covered, so I hope the working group will listen to reason and move resolve this problem before the standard is finalized.
Update 2021-04-25: It was recently pointed out to me that per RFC6647, issued in 2012, the IETF does acknowledge the usefulness of greylisting. The documents lists a number of useful implementation recommendations.
In other news, this year's EuroBSDCon in Strasbourg went smoothly with a lot of good content if a somewhat smaller number of participants than the previous one. The next chance to catch my PF tutorial live will be in London on November 26th, 2008. Contact the UKUUG for details and booking.
Standardization efforts rarely grab headlines. Except in rather exceptional circumstances (think Microsoft's recent ISO buyout), standards are formulated in response to specific technical needs, or as frequently happens, standards documents are written to codify existing and well known best practices.
Note: This piece is also available without trackers but classic formatting only here.
As regular readers of this column will be aware, I tend to argue that in the email domain, greylisting is one such 'best practice' that would deserve to be included in a standards or best practice document. In a way it already is, but it was never actually referred to in any standard or standards draft, and its main claim to standards compliance relied on extrapolating some crucial points in RFC2821.
The technical alibi for claiming that greylisting is a valid, RFC-compliant technique comes from reading section 4.5.4.1, Sending Strategy. It's really quite straightforward: An SMTP sender that receives a temporary error (451) when it tries to deliver is required to try again later, after a reasonable time. The RFC gives a few more details such as recommended retry intervals and time to give up trying, but does not explicitly say where those repeated attempts should come from. At the time RFC2821 was written in early 2001, it was almost certainly implicit in the formulation used that the retries would come from the same hosts, and specifying that as an explicit requirement most likely seemed more than a little redundant.
The MUST requirement in RFC2821 is what cleared the way for greylisting (see greylisting.org and the relevant parts of my PF tutorial (or the book)), and the earliest implementations started appearing from 2003 onwards.
Fast-forward a few years, and you have a situation where several large operators have set up their networks with large pools of outgoing SMTP servers and no guarantee which machine in the pool will be the next to handle a message queued for a delivery retry. Some greylisting implementations use a modified algorithm that stores or at least acts upon the subnet a greylistes message came from instead of the specific IP address, while others such as OpenBSD's spamd operate strictly on individual IP addresses.
It does not exactly take a rocket scientist to figure out that here is a potential problem, at least when it comes to the stricter implementations. Sites that do not retry from the same IP address can claim to be RFC2821 compliant, since the RFC does not contain a specific requirement that the delivery retries have to be from the same IP address. Again quoting my tutorial, the solution so far has been to whitelist those sites, extracting their SPF info for whitelisting purposes if necessary.
The large number of outgoing SMTP hosts per site problem has been widely discussed, and anybody even marginally interested in mail and spam avoidance should be aware if this. Then here's a surprise for you: Apparently the writers of RFCs are actually unaware of this, or chose not to care. On October 1, 2008, I found in my IETF mailbox RFC5321, which obsoletes RFC2821. The new RFC contains a number of things, but section 4.5.4.1, "Sending Strategy" (yes, they even kept the numbering) is unchanged.
That means that the working group were either unaware of the problem or chose not to resolve the conflict at this time. It would have been a very sensible thing to explicitly state that retries MUST come from the same IP address. The only halfway sane reason not to resolve the conflict that way would be that this would have made greylisting powerful enough to possibly go a long way towards eliminating the need for SPF and other more convoluted schemes.
I can not bring myself to believe that the working group does not have at least one member who is aware of what greylisting is and knows about that sole remaining problem that needs to be addressed. RFC5321 has almost everything else covered, so I hope the working group will listen to reason and move resolve this problem before the standard is finalized.
Update 2021-04-25: It was recently pointed out to me that per RFC6647, issued in 2012, the IETF does acknowledge the usefulness of greylisting. The documents lists a number of useful implementation recommendations.
In other news, this year's EuroBSDCon in Strasbourg went smoothly with a lot of good content if a somewhat smaller number of participants than the previous one. The next chance to catch my PF tutorial live will be in London on November 26th, 2008. Contact the UKUUG for details and booking.
Monday, September 22, 2008
“Name and Shame”, or socially responsible use of your log data
Your logs contain an ever-growing mass of data on spammers. How about making an effort to make that data useful to others?
Those of us who run email services know, from sometimes painful experience, what it takes to ensure that the minimum possible amount of unwanted advertising and scams that may turn out to be security hazards reaches our users' inboxes.
Email: This should have been very simple
Handling email should really be quite simple: The server is configured to know what domains it receives mail for and what users actually exist in those domains. When a machine makes contact and indicates that it intends to deliver email, the server check if the recipient is a valid user. If the recipient is valid, the message is received and put in the relevant user's mailbox. Otherwise, a message about a failed delivery and optionally the reason for the failure is sent to the user specified as the sender.
If they were all honest people
In each part of the process, the underlying premise is that the communicating parners offer each other correct information. Frequently that is the case, and we have legitimate communications between partners with a valid reason for contacting each other. Unfortunately there are other cases where the implicit trust is abused, such as when email messages are sent with a sender address other than the real one, quite likely a made-up one in a domain that belongs to other people. Some of us occasionally receive delivery failure messages for messages we verfiably did not send[1]. If we take the time to study the contents of those messages, in almost all cases we will find that the messages are spam, sometimes the scamming kind and perhaps part of an attempt to take control of the recipient's computer or steal sensitive data.
What do the ones in charge do, then?
If you ask a typical system administrator what measures are in effect to thwart attempts at delivering unwanted or malicious messages to their users, you will most likely get a description that says, essentially, the messages are filtered through systems that inspect message contents. If the message does not contain anything known to be bad (known spam or malware) or something sufficiently similar to a known bad, the message is delivered to the user's mailbox. If the system determines that the message contents indicates it should not be delivered, the messages is thrown away undelivered, and some system administrators will tell you that the system also sends a message about the decision not to deliver the message to the stated sender address.
Large parts or this is likely part of moderately educated users' passive knowledge, and most of us are likely to accept that content filtering is all we can do to keep dubious or downright criminal elements out of our working environment. For the individual end user, only minor adjustments to this are likely to be possible.
Measures based on observed behavior
But those of us who actually run the service also have the opportunity to study the automatically generated log data from our systems and use spammers' (that is, senders of all types of unwanted mail, including malware) behavior patterns to remove most of the unwanted traffic before actual message content is known. In order to do that, it is necessary to go to a more basic level of network traffic and study sender behavior on the network level.
One of the simpler forms of behavior based measures emerged in the form of a technique called greylisting in 2003. The technique is based on a slightly pedantic and rather creative interpretation of established standards. The Internet protocol for email transfer, SMTP (the Simple Mail Transfer Protocol) allows servers that experience temporary problems that make it impossible to receive mail to report a specific 'temporary local problem' status code to correspondents trying to deliver mail. Correctly configured senders will interpret and act on the status code and delay delivery for a short time. In most circumstances, the delivery will succeed within a short time. It is worth noting that this part of the standard was formulated to help the mail service's reliability. At most times, the retries happen without alerting the person who wrote and sent the message. The messages generally reach their destination eventually.
Lists of grey and black, little white lies
Greylisting works like this: the server reports a temporary local problem to all attempts at delivery from machines the server has not exchanged mail with earlier. Experience shows that the pre-experiment hypothesis was mainly correct: Essentially all machines that try to deliver valid email are configured to check return codes and act on them, while almost all spam senders dump as many messages as possible, and never check any return codes. This means that somewhere in the eighty to high nineties percentage of all spam volume is discarded at the first delivery attempt (before any content filtering), while legitimate email reaches its intended recipients, occasionally with delayed delivery of the initial message from a new correspondent.
One other behavior based technique that predates greylisting is the use of 'blacklists' - lists of machines that have been classified as spam senders - and rejecting mail from machines on such lists. Some groups eventually started experimenting with 'tarpits', a technique that essentially means your end of the communication moves along very slowly. A much cited example is the spamd program, released as a part of the free operating system OpenBSD in May of 2003. The program's main purpose at the time was to answer email traffic from blacklisted hosts one byte per second, never leaving a blacklisted host any real chance of delivering messages.
The combination of blacklists and greylisting proved to work very well, but the quest for even more effective measures continued. Yet again, the next logical step grew out of observing spammer behavior. We saw earlier that spammers do not bother to check whether individual messages are in fact delivered.
Laying traps and bait
By early 2005, these observations lead to a theory that was soon proved useful: If we have one or more addresses in our own domains the are certain to never receive any valid mail, we can be almost a hundred percent certain that any mail addressed to those addresses is spam. The addresses are spamtraps. Any machines that try to deliver spam to those addresses are placed in a local blacklist, and we keep them busy by answering their traffic at a rate of one byte per second. The machines stay on the blacklist for 24 hours unless otherwise specified.
The new technique, dubbed greytrapping was launched as part of the improved spamd in OpenBSD 3.8, released May 2005. In early 2006, Bob Beck, one of the main spamd developers announced that his greytrapping hosts at the University of Alberta generates a downloadable blacklist based on the greyptrap data, updated once per hour, ready for inclusion in spamd setups elsewhere. This is obviously useful. Machines that try to deliver mail to addresses that were never deliverable most likely do not have any valid mail to deliver, and it we are doing society at large a favor by delaying their deliveries and wasting their time to the maximum extent possible.
It is worth mentioning that during the period we have used the University of Alberta blacklist at our site, it has contained a minimum of twenty-some thousand IP addresses, and during some busy periods have reached almost two hundred thousand.
You can help, too
Fortunately you do not need to be a core developer to be able to contribute. The exact same tools Bob Beck uses to generate his blacklist is available to everybody else as part of OpenBSD, and they are actually not very hard to use productively.
Here at BSDdly.net and associated domains we saw during the (Northern hemisphere) summer of 2007 a marked increase in email sent to addresses that have never actually existed in our domains. This was clearly a case of somebody, one or more groups, making up or generating sender addresses to avoid seeing any reactions to the spam they were sending. This in turn lead to us starting an experiment that is still ongoing. We record invalid addresses in our own domains as they turn up in our logs. From these addresses we pick the really improbable ones, put them in our local spamtrap list and publish the list on a specific web page on our server[2].
Experience shows that it it takes a very short time for the addresses we put on the web page to turn up as target addresses for spam. This means that we have succeeded in feeding the spammers data that makes it easier for us to stop their attempts, and frequently we make spam senders use significant amounts of time communicating with our machines with no chance of actually achieving anything. The number of spamtrap addresses has reached fifteen thousand, and we have at times observed groups of machines that spend weeks working through the whole list, with average time spent per unsuccessful delivery attempt clocked at roughly seven minutes.
As a byproduct of the active spammer trapping we started exporting our own list of machines that had been trapped via the spamtrap addresses during the last 24 hours and making the list available for download. This list's existence has only been announced via the spamtrap addresses web page and a few blog posts, but we see that it's retrieved, most likely automatically, at intervals and is apparently used by other sites in their systems.
At this point we have established that it is possible to create a system that makes it very unlikely that spam actually makes it through to users, while at the same time it is quite unlikely that legitimate mail is adversely affected. In other words, we have the cyberspace equivalent of good fences around our property, but spammers are still out there and may create serious probles for those who are without adequate protection.
Collecting evidence, or at least seek clarity
We would have loved to see law enforcement take the spammer problem seriously. This is not just because the spam that reaches its targets is irritating, but rather because almost all spam is sent via equipment that spammers use without the legal owners' consent. We would have liked to see resources allocated in proportion to the criminal activity the spam represents. We would have liked to help, but it might seem that we would not have usable evidence available due to the fact that we do not actually receive the messages the spammers try to deliver. On the other hand, we have at all times a list of machines that have tried to deliver spam, identified with an almost hundred percent certainty based on the spammer trapping addresses. In addition, our systems routinely produce logs of all activity, with the level of detail we set ourselves. This means that it is possible to search our logs for the IP addresses that have tried to deliver spam to our systems during the last 24 hours, and get a summary of what those machines have done.
A search of this kind typically yields a result like this:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks
The first line here states that 190.20.132.16 contacts our system at 02:34:29 AM on August tenth, as the fourth active SMTP connection, three blacklisted. A few seconds later it appears that this is an attempt at delivering a message to the address asasaskosmicki@bsdly.net. That address was already one of our spamtraps, most likely one that was harvested from our logs and was originally made up somewhere else. After 12 seconds, the machine disconnects. The attempted delivery to a spamtrap address means that the machine is added to our local spamd-greytrap blacklist, as indicated in the entry for the next attempt about one hour later. This second attempt lasts for 41 seconds. The third try in our log material happens just after 06:30, and the addition of the list name becks indicates that in the meantime has tried to deliver to one of Bob Beck's spammer trap addresses and has entered that blacklist, too.
Unfortunately, it is unlikely that logs of this kind are sufficient as evidence for criminal prosecution purposes, but the data may be of some use to those who have an interest in keeping machines in their care from sending spam.
“Name And Shame“, or just being neighborly?
After some discussions with colleagues I decided in early August 2008 to generate daily reports of the activities of machines that had made it into the local blacklist on bsdly.net and publish the results. If all we have is the fact that a machine has entered a blacklist as an IP address (such as 24.165.4.190), and there is no supporting material, it is fairly easy for whoever is in charge of that address range to just ignore the entry as an unsupported allegation. We hope that when whoever is responsible for the network containing 24.165.4.190 sees a sequence like this,
Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap
they will find that to be a sufficient for action of some kind. The material we generate is available via the “The Name And Shame Robot” web page. The latest complete report of log excerpts is available via links at that page. Previous versions are archived offline, but will be made available on request to parties with valid reasons to request the data.
“The Name And Shame Robot”" is rather new, and it is too early to say what effect, if any, the publication has had. We hope that others will do similar things based on their local log data or even synchronize their data with ours. If you are interested in participating, please make contact.
Regardless of other factors, we hope that the data can be useful as indicators of potential for improvement in the networks that appear regularly in the reports as well as material for studies that will produce even better techniques for spam avoidance.
A shorter version of this article in Norwegian was published in Computerworld's Norwegian edition on August 22, 2008; the longer Norwegian version is available as an earlier blog post.
[1] A collection of such failure messages collected earlier this year is available at http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt.
[2] See http://www.bsdly.net/~peter/traplist.shtml, references at that page lead to my blog, which consists of public field notes, as well as other relevant material.
About the author
Peter N. M. Hansteen (peter@bsdly.net) is a consultant, system administrator and writer, based in Bergen, Norway. In October 2008, he joined FreeCode, the Norwegian free software consultancy. He has written various articles as well as "The Book of PF", published by No Starch Press in 2007, and lectures on Unix- and network-related topics. He is a main organizer of BLUG (Bergen (BSD and) Linux User Group), vice president of NUUG (Norwegian Unix User Group) and an occasional activist for EFF's Norwegian sister organization EFN (Elektronisk Forpost Norge).
Those of us who run email services know, from sometimes painful experience, what it takes to ensure that the minimum possible amount of unwanted advertising and scams that may turn out to be security hazards reaches our users' inboxes.
Email: This should have been very simple
Handling email should really be quite simple: The server is configured to know what domains it receives mail for and what users actually exist in those domains. When a machine makes contact and indicates that it intends to deliver email, the server check if the recipient is a valid user. If the recipient is valid, the message is received and put in the relevant user's mailbox. Otherwise, a message about a failed delivery and optionally the reason for the failure is sent to the user specified as the sender.
If they were all honest people
In each part of the process, the underlying premise is that the communicating parners offer each other correct information. Frequently that is the case, and we have legitimate communications between partners with a valid reason for contacting each other. Unfortunately there are other cases where the implicit trust is abused, such as when email messages are sent with a sender address other than the real one, quite likely a made-up one in a domain that belongs to other people. Some of us occasionally receive delivery failure messages for messages we verfiably did not send[1]. If we take the time to study the contents of those messages, in almost all cases we will find that the messages are spam, sometimes the scamming kind and perhaps part of an attempt to take control of the recipient's computer or steal sensitive data.
What do the ones in charge do, then?
If you ask a typical system administrator what measures are in effect to thwart attempts at delivering unwanted or malicious messages to their users, you will most likely get a description that says, essentially, the messages are filtered through systems that inspect message contents. If the message does not contain anything known to be bad (known spam or malware) or something sufficiently similar to a known bad, the message is delivered to the user's mailbox. If the system determines that the message contents indicates it should not be delivered, the messages is thrown away undelivered, and some system administrators will tell you that the system also sends a message about the decision not to deliver the message to the stated sender address.
Large parts or this is likely part of moderately educated users' passive knowledge, and most of us are likely to accept that content filtering is all we can do to keep dubious or downright criminal elements out of our working environment. For the individual end user, only minor adjustments to this are likely to be possible.
Measures based on observed behavior
But those of us who actually run the service also have the opportunity to study the automatically generated log data from our systems and use spammers' (that is, senders of all types of unwanted mail, including malware) behavior patterns to remove most of the unwanted traffic before actual message content is known. In order to do that, it is necessary to go to a more basic level of network traffic and study sender behavior on the network level.
One of the simpler forms of behavior based measures emerged in the form of a technique called greylisting in 2003. The technique is based on a slightly pedantic and rather creative interpretation of established standards. The Internet protocol for email transfer, SMTP (the Simple Mail Transfer Protocol) allows servers that experience temporary problems that make it impossible to receive mail to report a specific 'temporary local problem' status code to correspondents trying to deliver mail. Correctly configured senders will interpret and act on the status code and delay delivery for a short time. In most circumstances, the delivery will succeed within a short time. It is worth noting that this part of the standard was formulated to help the mail service's reliability. At most times, the retries happen without alerting the person who wrote and sent the message. The messages generally reach their destination eventually.
Lists of grey and black, little white lies
Greylisting works like this: the server reports a temporary local problem to all attempts at delivery from machines the server has not exchanged mail with earlier. Experience shows that the pre-experiment hypothesis was mainly correct: Essentially all machines that try to deliver valid email are configured to check return codes and act on them, while almost all spam senders dump as many messages as possible, and never check any return codes. This means that somewhere in the eighty to high nineties percentage of all spam volume is discarded at the first delivery attempt (before any content filtering), while legitimate email reaches its intended recipients, occasionally with delayed delivery of the initial message from a new correspondent.
One other behavior based technique that predates greylisting is the use of 'blacklists' - lists of machines that have been classified as spam senders - and rejecting mail from machines on such lists. Some groups eventually started experimenting with 'tarpits', a technique that essentially means your end of the communication moves along very slowly. A much cited example is the spamd program, released as a part of the free operating system OpenBSD in May of 2003. The program's main purpose at the time was to answer email traffic from blacklisted hosts one byte per second, never leaving a blacklisted host any real chance of delivering messages.
The combination of blacklists and greylisting proved to work very well, but the quest for even more effective measures continued. Yet again, the next logical step grew out of observing spammer behavior. We saw earlier that spammers do not bother to check whether individual messages are in fact delivered.
Laying traps and bait
By early 2005, these observations lead to a theory that was soon proved useful: If we have one or more addresses in our own domains the are certain to never receive any valid mail, we can be almost a hundred percent certain that any mail addressed to those addresses is spam. The addresses are spamtraps. Any machines that try to deliver spam to those addresses are placed in a local blacklist, and we keep them busy by answering their traffic at a rate of one byte per second. The machines stay on the blacklist for 24 hours unless otherwise specified.
The new technique, dubbed greytrapping was launched as part of the improved spamd in OpenBSD 3.8, released May 2005. In early 2006, Bob Beck, one of the main spamd developers announced that his greytrapping hosts at the University of Alberta generates a downloadable blacklist based on the greyptrap data, updated once per hour, ready for inclusion in spamd setups elsewhere. This is obviously useful. Machines that try to deliver mail to addresses that were never deliverable most likely do not have any valid mail to deliver, and it we are doing society at large a favor by delaying their deliveries and wasting their time to the maximum extent possible.
It is worth mentioning that during the period we have used the University of Alberta blacklist at our site, it has contained a minimum of twenty-some thousand IP addresses, and during some busy periods have reached almost two hundred thousand.
You can help, too
Fortunately you do not need to be a core developer to be able to contribute. The exact same tools Bob Beck uses to generate his blacklist is available to everybody else as part of OpenBSD, and they are actually not very hard to use productively.
Here at BSDdly.net and associated domains we saw during the (Northern hemisphere) summer of 2007 a marked increase in email sent to addresses that have never actually existed in our domains. This was clearly a case of somebody, one or more groups, making up or generating sender addresses to avoid seeing any reactions to the spam they were sending. This in turn lead to us starting an experiment that is still ongoing. We record invalid addresses in our own domains as they turn up in our logs. From these addresses we pick the really improbable ones, put them in our local spamtrap list and publish the list on a specific web page on our server[2].
Experience shows that it it takes a very short time for the addresses we put on the web page to turn up as target addresses for spam. This means that we have succeeded in feeding the spammers data that makes it easier for us to stop their attempts, and frequently we make spam senders use significant amounts of time communicating with our machines with no chance of actually achieving anything. The number of spamtrap addresses has reached fifteen thousand, and we have at times observed groups of machines that spend weeks working through the whole list, with average time spent per unsuccessful delivery attempt clocked at roughly seven minutes.
As a byproduct of the active spammer trapping we started exporting our own list of machines that had been trapped via the spamtrap addresses during the last 24 hours and making the list available for download. This list's existence has only been announced via the spamtrap addresses web page and a few blog posts, but we see that it's retrieved, most likely automatically, at intervals and is apparently used by other sites in their systems.
At this point we have established that it is possible to create a system that makes it very unlikely that spam actually makes it through to users, while at the same time it is quite unlikely that legitimate mail is adversely affected. In other words, we have the cyberspace equivalent of good fences around our property, but spammers are still out there and may create serious probles for those who are without adequate protection.
Collecting evidence, or at least seek clarity
We would have loved to see law enforcement take the spammer problem seriously. This is not just because the spam that reaches its targets is irritating, but rather because almost all spam is sent via equipment that spammers use without the legal owners' consent. We would have liked to see resources allocated in proportion to the criminal activity the spam represents. We would have liked to help, but it might seem that we would not have usable evidence available due to the fact that we do not actually receive the messages the spammers try to deliver. On the other hand, we have at all times a list of machines that have tried to deliver spam, identified with an almost hundred percent certainty based on the spammer trapping addresses. In addition, our systems routinely produce logs of all activity, with the level of detail we set ourselves. This means that it is possible to search our logs for the IP addresses that have tried to deliver spam to our systems during the last 24 hours, and get a summary of what those machines have done.
A search of this kind typically yields a result like this:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks
The first line here states that 190.20.132.16 contacts our system at 02:34:29 AM on August tenth, as the fourth active SMTP connection, three blacklisted. A few seconds later it appears that this is an attempt at delivering a message to the address asasaskosmicki@bsdly.net. That address was already one of our spamtraps, most likely one that was harvested from our logs and was originally made up somewhere else. After 12 seconds, the machine disconnects. The attempted delivery to a spamtrap address means that the machine is added to our local spamd-greytrap blacklist, as indicated in the entry for the next attempt about one hour later. This second attempt lasts for 41 seconds. The third try in our log material happens just after 06:30, and the addition of the list name becks indicates that in the meantime has tried to deliver to one of Bob Beck's spammer trap addresses and has entered that blacklist, too.
Unfortunately, it is unlikely that logs of this kind are sufficient as evidence for criminal prosecution purposes, but the data may be of some use to those who have an interest in keeping machines in their care from sending spam.
“Name And Shame“, or just being neighborly?
After some discussions with colleagues I decided in early August 2008 to generate daily reports of the activities of machines that had made it into the local blacklist on bsdly.net and publish the results. If all we have is the fact that a machine has entered a blacklist as an IP address (such as 24.165.4.190), and there is no supporting material, it is fairly easy for whoever is in charge of that address range to just ignore the entry as an unsupported allegation. We hope that when whoever is responsible for the network containing 24.165.4.190 sees a sequence like this,
Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap
they will find that to be a sufficient for action of some kind. The material we generate is available via the “The Name And Shame Robot” web page. The latest complete report of log excerpts is available via links at that page. Previous versions are archived offline, but will be made available on request to parties with valid reasons to request the data.
“The Name And Shame Robot”" is rather new, and it is too early to say what effect, if any, the publication has had. We hope that others will do similar things based on their local log data or even synchronize their data with ours. If you are interested in participating, please make contact.
Regardless of other factors, we hope that the data can be useful as indicators of potential for improvement in the networks that appear regularly in the reports as well as material for studies that will produce even better techniques for spam avoidance.
A shorter version of this article in Norwegian was published in Computerworld's Norwegian edition on August 22, 2008; the longer Norwegian version is available as an earlier blog post.
[1] A collection of such failure messages collected earlier this year is available at http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt.
[2] See http://www.bsdly.net/~peter/traplist.shtml, references at that page lead to my blog, which consists of public field notes, as well as other relevant material.
About the author
Peter N. M. Hansteen (peter@bsdly.net) is a consultant, system administrator and writer, based in Bergen, Norway. In October 2008, he joined FreeCode, the Norwegian free software consultancy. He has written various articles as well as "The Book of PF", published by No Starch Press in 2007, and lectures on Unix- and network-related topics. He is a main organizer of BLUG (Bergen (BSD and) Linux User Group), vice president of NUUG (Norwegian Unix User Group) and an occasional activist for EFF's Norwegian sister organization EFN (Elektronisk Forpost Norge).
Sunday, August 31, 2008
[.NO] “Name and Shame” eller samfunnsnyttig bruk av loggdata om spammere
Today's post is in Norwegian - I'll be back in English later
Vi sitter med stadig voksende mengder med data om spammere. Kan vi bruke dette på en måte som er nyttig for andre?
Vi som selv står for driften av eposttjenester vet av tidvis smertelig erfaring hva som skal til for at minimalt med uønsket reklame og lureri som kan være direkte trusler mot sikkerheten faktisk havner i innboksene til brukerne våre.
Epost: Dette burde vært enkelt
I utgangspunktet burde det være en enkel sak å håndtere epost: Serveren er satt opp slik at den vet hvilke domener den skal ta imot post for, og hvilke brukere som eksisterer i domenene. Når en maskin tar kontakt og signaliserer at den ønsker å levere epost, sjekker serveren om meldingen er adressert til en gyldig bruker. Er det snakk om en gyldig bruker, blir meldingen mottatt og lagt i innboksen til den aktuelle brukeren, i motsatt fall får den som er oppgitt som avsender beskjed om at meldingen ikke lot seg levere, og hvorfor.
Hvis bare alle var ærlige
I hvert enkelt ledd av denne prosessen er det en underliggende forutsetning at kommunikasjonspartnerne oppgir korrekt informasjon. I mange tilfeller er det slik, og det dreier seg om legitim kommunikasjon mellom parter som har grunn til å ønske kontakt. Dessverre finnes det også tilfeller der denne grunnleggende tilliten blir brutt eller misbrukt, for eksempel når epostmeldinger blir sendt med annen avsenderadresse enn den reelle, gjerne noe oppdiktet i et domene som tilhører andre. En del av oss har også opplevd å få returmeldinger om at levering ikke var mulig på grunnlag av meldinger som vi vitterlig ikke har sendt[1]. Når vi ser nøyere på innholdet i disse meldingene, vil vi i nesten alle tilfeller se at dette er søppelpost, tidvis med innslag av svindel og kanskje ledd i førsøk på å overta mottakerens maskin eller stjele sensitive data.
Hva gjør de ansvarlige?
Hvis du spør en typisk systemadministrator om hvilke tiltak som er satt i verk for å hindre at uønskede eller skadelige meldinger faktisk kommer frem til brukerne, vil du antakelig få høre en beskrivelse som stort sett kan kokes ned til at posten blir filtrert gjennom systemer som studerer innholdet i meldingene. Hvis meldingen ikke inneholder noe som er kjent som uønsket (kjent spam eller skadelig programvare) eller noe som likner på noe som kunne være det, blir meldingen levert til brukeren den er adressert til. Hvis systemet avgjør at meldingen har innhold som gjør at den ikke skal leveres til mottaker, blir meldingen i mange tilfeller kastet uten å bli levert, og noen systemadministratorer vil nok også fortelle deg at systemet sender melding om avgjørelsen tilbake til adressen som er oppgitt som avsender.
Mye av dette hører antakelig til den passive kunnskapen hos mange, og de fleste slår seg til ro med at innholdsfiltrering er det eneste vi kan gjøre for å holde tvilsomme eller direkte kriminelle elementer unna arbeidsmiljøet. For en enkelt sluttbruker er det sannsynligvis bare mindre justeringer ut fra dette som er mulig.
Tiltak basert på observert adferd
Men for oss som håndterer driften av selve tjenesten er det mulig å studere dataene som registreres automatisk i loggene våre og bruke spammernes (her brukt om avsendere av alle typer uønsket epost, inkludert skadevare) adferd til å fjerne mesteparten av den uønskede trafikken før innholdet i meldingene er kjent. Det er nødvendig å gå ned på et noe mer grunnleggende nivå i nettverkstrafikken og studere adferden på nettverksnivå.
En av de enkleste formene av slike adferdsbaserte tiltak dukket opp i form av en teknikk som fikk navnet grålisting (greylisting) i 2003. Teknikken bygger på en litt pedantisk og ganske kreativ tolkning av allerede vedtatte standarder. Protokollen som brukes for epost-overføring på Internett, SMTP (Simple Mail Transfer Protocol) inneholder en mulighet for at en server som har antatt forbigående problemer med å ta imot epost kan rapportere om tilstanden ved å svare med en spesiell feilkode når andre maskiner prøver å levere. Hvis avsendermaskinen er korrekt konfigurert, vil den vente en viss tid før den gjør et nytt forsøk på levering, og etter all sannsynlighet vil den lykkes etter kort tid. Det er verd å merke seg at dette er en del av standarden som først og fremst skulle sørge for at eposttjenesten skulle være så pålitelig som mulig, og i de fleste tilfeller skjer alt dette uten at personen som skrev og sendte meldingen merker noe til det. Meldingene kommer frem, og alle er fornøyde.
Grå og svarte lister, små hvite løgner
Grålisting går ut på at serveren gir melding om midlertidig feil til alle forsøk på epostlevering fra maskiner den ikke har hatt kontakt med før. Erfaringene viste at antakelsene fra før eksperimentet i hovedsak var korrekte: De aller fleste maskiner som sender legitim epost er satt opp til å sjekke returkoder og handle etter dem, mens de aller fleste avsendere av spam bare pøser på så mange meldinger som mulig, uten å sjekke returkoder. Resultatet er at et sted mellom åtti og noenognitti prosent av spam-mengden blir stoppet på første forsøk (før eventuell innholdsfiltrering), mens legitim post kommer frem, i noen tilfeller med en viss forsinkelse ved første kontakt.
En annen adferdsbasert teknikk, som faktisk var i bruk før grålisting ble utbredt, er å bruke såkalte svartlister - lister over maskiner som er blitt klassifisert som spam-avsendere - og avvise post fra maskiner som var med i listen. Noen grupper begynte etterhvert å eksperimentere med såkalte tjærehull (tarpit), der teknikken går ut på å forsinke trafikk fra maskiner som er med i en svartliste ved å la sin del av kommunikasjonen gå svært sakte. Et eksempel som ofte nevnes er programmet spamd, som det frie operativsystemet OpenBSD lanserte i mai 2003. Programmet hadde da som sin hovedoppgave å svare på eposttrafikk fra svartlistede maskiner med ett tegn i sekundet, uten at avsendere som var med i en svartliste hadde noen reell mulighet til å få levert meldingene.
Kombinasjonen av svartlister og grålisting har vist seg å fungere utmerket. Likevel fortsetter praktikere og utviklere forsøkene på å få til enda mer effektive teknikker. Enda en gang kom neste logiske steg som resultat av observert adferd. Vi så tidligere at spammere ikke bryr seg om å sjekke om hver enkelt melding faktisk kommer frem.
Legge ut snarer og agn
Tidlig i 2005 førte dette til at det ble det formulert en teori som det viste seg å være hold i: Hvis vi så lager en eller flere adresser i vårt eget domene som vi vet aldri vil ha noen grunn til å få legitim post, så kan vi med nesten hundre prosent sikkerhet vite at post som blir forsøkt levert til disse adressene er spam. Adressene er spammerfeller. Maskiner som prøver å levere spam til disse adressene, kan vi legge i en lokal svartliste som vi så oppholder med ett tegn i sekundet. Maskinene blir liggende i svartlisten i 24 timer dersom ikke annet tilsier det.
Den nye teknikken fikk navnet greytrapping, og ble lansert som del av en forbedret spamd i OpenBSD 3.8 i mai 2005. Bob Beck, som var sentral deltaker i utviklingen av spamd, annonserte tidlig i 2006 at han brukte greytrapping på sentralt plasserte maskiner ved University of Alberta, og gjorde svartlistene som er resultat av fangsten, med oppdateringer hver time, tilgjengelig for nedlasting slik at andre kan bruke listene i sine oppsett. Dette er åpenbart nyttig, maskiner som sender til adresser som aldri har vært leverbare, har etter all sannsynlighet ikke legitim epost å levere, og vi gjør samfunnet en tjeneste ved å hindre dem i leveringen og å få dem til å kaste bort så mye tid som mulig.
Det er kanskje verd å nevne at svartlisten fra University of Alberta så lenge undertegnede har fulgt den har inneholdt minst noenogtyvetusen IP-adresser, og har i travle perioder vært oppe i nesten tohundretusen maskiner.
Også du kan bidra
Likevel er det ikke nødvendig å være sentralt plassert programvareutvikler for å kunne bidra positivt. De samme verktøyene som Beck bruker til å generere sin svartliste er tilgjengelige for alle som del av OpenBSD, og er ikke så vanskelig å bruke som man kunne frykte.
Her på bsdly.net og beslektede domener observerte vi sommeren 2007 en markert økning i meldinger om ikke leverbar epost til epostadresser som aldri har eksistert i noen av våre domener. Her var det helt klart snakk om at noen, en eller flere grupper, genererte eller fant på avsenderadresser for å unngå å få reaksjoner på spammen sin tilbake til seg. Dette førte i sin tur til et eksperiment som vi fortsatt har gående. Vi registrerer ugyldige adresser i våre egne domener som dukker opp i loggene våre. Av disse adressene velger vi ut de helt usannsynlige, legger dem inn i vår lokale spammerfelle-liste og legger ut listen på en egen side på webserveren vår[2].
Erfaringene viser at det tar svært kort tid før adressene vi fører opp på denne siden dukker opp som mottakeradresser. Kort og godt har vi klart å fore spammere med data som gjør det enklere for oss å stoppe dem, og i mange tilfeller får vi i tillegg spamsenderne til å bruke betydelige mengder tid på å kommunisere med våre maskiner uten å oppnå noe som helst. Antallet spammerfelle-adresser i vår liste er nå oppe i rundt femtentusen, og vi har tidvis observert grupper av maskiner som bruker noen uker på å arbeide seg gjennom hele listen, med gjennomsnittlig noe under syv minutter brukt per mislykkede leveringsforsøk.
Som et biprodukt av denne aktive spammerfangingen begynte vi å eksportere vår egen liste over maskiner som har blitt fanget via spammerfelle-adressene de siste 24 timer og legge den ut tilgjengelig for nedlasting. At denne listen finnes, er kun annonsert via siden med spammerfelle-adressene og noen bloggposter, men vi ser at den blir hentet regelmessig og antakelig automatisk av andre som bruker den i sine systemer.
Så langt har vi etablert at det er mulig å lage et system som gjør at sannsynligheten for at spam kommer gjennom til brukere er svært liten, samtidig som det er nesten helt usannsynlig at legitim post blir merkbart hindret. Dermed har vi det som tilsvarer gode gjerder rundt egen eiendom, men spammerne finnes fortsatt der ute og er et potensielt alvorlig problem for de som ikke har tilstrekkelig beskyttelse mot dem.
Samle bevis, eller i det minste skape klarhet
Aller helst hadde vi ønsket at politi og påtalemyndigheter hadde tatt spammerproblemet alvorlig. Dette ikke bare fordi den spammen som kommer frem er irriterende å se, men fordi nesten all spam sendes via utstyr som spammerne bruker uten eiernes samtykke. Kort og godt ønsker vi at det blir satt inn ressurser som står i forhold til den kriminelle virksomheten spammen representerer. Vi ville gjerne hjelpe til, men i utgangspunktet kan det virke som vi kan ha problem med å skaffe til veie brukbart bevismateriale siden vi ikke mottar meldingene som spammerne forsøker å levere. På den annen side har vi til enhver tid en liste over maskiner som har prøvd å levere spam, noe nær hundre prosent sikkert identifisert på grunnlag av spammerfelle-adressene. I tillegg produserer systemene våre rutinemessig logger over all aktivitet, med det detaljnivået vi selv velger. Dermed går det an å søke i loggene etter IP-adressene som har forsøkt å levere spam til oss siste 24 timer, og få oversikt over hva maskinene har foretatt seg.
Resultatet av et typisk søk av denne typen ser slik ut:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks
Den første linjen angir at 190.20.132.16 tar kontakt med vårt system klokken 02.34.29 om morgenen tiende august, som fjerde aktive forbindelse, derav tre svartlistede. Noen sekunder senere blir det klart at dette er et forsøk på å levere en melding til adressen asasaskosmicki@bsdly.net, som er blant de vi har som spammerfelle, sannsynligvis høstet fra logger og diktet opp et annet sted i verden. Etter 12 sekunder kobler denne maskinen fra. Forsøket på å levere til en spammerfelle gjør at maskinen blir oppført i vår lokale svartliste, spamd-greytrap, noe som vises klart når maskinen prøver igjen litt mer enn en time senere. På dette forsøket blir den oppholdt i 41 sekunder. Det tredje forsøket i vårt loggmateriale skjer like etter 06.30, og at listenavnet becks har kommet til, viser at maskinen i mellomtiden har forsøkt å levere til en av Bob Becks spammerfelle-adresser og nå er med i også den svartlisten.
Det er dessverre lite sannsynlig at slike logger er tilstrekkelige som bevismateriale i straffesaker, men for de som har interesse av enten å sørge for at maskinene de administrerer i så liten grad som mulig brukes til spamutsendelse eller de som har interesse av spammeradferd er dette nyttige data.
“Name And Shame”, eller kanskje bare godt naboskap?
Etter noen diskusjoner med kolleger bestemte jeg meg tidlig i august 2008 for å generere daglige oversikter over aktivitetene til maskiner som har kommet inn i vår lokale svartliste på bsdly.net og legge dem ut offentlig. Når en maskin er oppført i en svartliste med bare IP-adresse (for eksempel 24.165.4.190), uten annet materiale om oppføringen, er oppføringen mest en påstand som de ansvarlige godt kan velge å ikke tro på. Vårt håp er at om den som er ansvarlig for nettverket der 24.165.4.190 hører hjemme ser en sekvens som denne,
Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap
så er det tilstrekkelig grunnlag for å gjøre noe aktivt. Materialet er tilgjengelig via The Name And Shame Robot-siden på http://www.bsdly.net/~peter/nameandshame.html. Siste genererte loggoversikt er tilgjengelig via referanser på den siden, tidligere utgaver blir arkivert, men vil være tilgjengelige ved godt begrunnet forespørsel.
The Name and Shame Robot er såpass ny at vi ikke kan si spesielt mye om effekten av offentliggjøringen. Det er lov å håpe på at andre vil gjøre noe tilsvarende ut fra sine lokale loggdata eller kanskje til og med synkronisere sine data med våre. Ta gjerne kontakt hvis du er interessert i dette arbeidet.
Uavhengig av alt annet håper vi at dataene kan være nyttige, både som påpeking av forbedringspotensiale for de nettverkene som opptrer jevnlig i oversiktene og som materiale for studier som kan gi oss enda bedre spambekjempelse.
Noter
[1] En samling av slike returmeldinger fra tidligere i år kan beskues på http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt
[2] http://www.bsdly.net/~peter/traplist.shtml, referanser på den siden fører videre til bloggen min som jeg bruker til offentlige notater, og annet relevant materiale.
En forkortet utgave av denne artikkelen ble trykt i Computerworld Norge 22. august 2008.
Vi sitter med stadig voksende mengder med data om spammere. Kan vi bruke dette på en måte som er nyttig for andre?
Vi som selv står for driften av eposttjenester vet av tidvis smertelig erfaring hva som skal til for at minimalt med uønsket reklame og lureri som kan være direkte trusler mot sikkerheten faktisk havner i innboksene til brukerne våre.
Epost: Dette burde vært enkelt
I utgangspunktet burde det være en enkel sak å håndtere epost: Serveren er satt opp slik at den vet hvilke domener den skal ta imot post for, og hvilke brukere som eksisterer i domenene. Når en maskin tar kontakt og signaliserer at den ønsker å levere epost, sjekker serveren om meldingen er adressert til en gyldig bruker. Er det snakk om en gyldig bruker, blir meldingen mottatt og lagt i innboksen til den aktuelle brukeren, i motsatt fall får den som er oppgitt som avsender beskjed om at meldingen ikke lot seg levere, og hvorfor.
Hvis bare alle var ærlige
I hvert enkelt ledd av denne prosessen er det en underliggende forutsetning at kommunikasjonspartnerne oppgir korrekt informasjon. I mange tilfeller er det slik, og det dreier seg om legitim kommunikasjon mellom parter som har grunn til å ønske kontakt. Dessverre finnes det også tilfeller der denne grunnleggende tilliten blir brutt eller misbrukt, for eksempel når epostmeldinger blir sendt med annen avsenderadresse enn den reelle, gjerne noe oppdiktet i et domene som tilhører andre. En del av oss har også opplevd å få returmeldinger om at levering ikke var mulig på grunnlag av meldinger som vi vitterlig ikke har sendt[1]. Når vi ser nøyere på innholdet i disse meldingene, vil vi i nesten alle tilfeller se at dette er søppelpost, tidvis med innslag av svindel og kanskje ledd i førsøk på å overta mottakerens maskin eller stjele sensitive data.
Hva gjør de ansvarlige?
Hvis du spør en typisk systemadministrator om hvilke tiltak som er satt i verk for å hindre at uønskede eller skadelige meldinger faktisk kommer frem til brukerne, vil du antakelig få høre en beskrivelse som stort sett kan kokes ned til at posten blir filtrert gjennom systemer som studerer innholdet i meldingene. Hvis meldingen ikke inneholder noe som er kjent som uønsket (kjent spam eller skadelig programvare) eller noe som likner på noe som kunne være det, blir meldingen levert til brukeren den er adressert til. Hvis systemet avgjør at meldingen har innhold som gjør at den ikke skal leveres til mottaker, blir meldingen i mange tilfeller kastet uten å bli levert, og noen systemadministratorer vil nok også fortelle deg at systemet sender melding om avgjørelsen tilbake til adressen som er oppgitt som avsender.
Mye av dette hører antakelig til den passive kunnskapen hos mange, og de fleste slår seg til ro med at innholdsfiltrering er det eneste vi kan gjøre for å holde tvilsomme eller direkte kriminelle elementer unna arbeidsmiljøet. For en enkelt sluttbruker er det sannsynligvis bare mindre justeringer ut fra dette som er mulig.
Tiltak basert på observert adferd
Men for oss som håndterer driften av selve tjenesten er det mulig å studere dataene som registreres automatisk i loggene våre og bruke spammernes (her brukt om avsendere av alle typer uønsket epost, inkludert skadevare) adferd til å fjerne mesteparten av den uønskede trafikken før innholdet i meldingene er kjent. Det er nødvendig å gå ned på et noe mer grunnleggende nivå i nettverkstrafikken og studere adferden på nettverksnivå.
En av de enkleste formene av slike adferdsbaserte tiltak dukket opp i form av en teknikk som fikk navnet grålisting (greylisting) i 2003. Teknikken bygger på en litt pedantisk og ganske kreativ tolkning av allerede vedtatte standarder. Protokollen som brukes for epost-overføring på Internett, SMTP (Simple Mail Transfer Protocol) inneholder en mulighet for at en server som har antatt forbigående problemer med å ta imot epost kan rapportere om tilstanden ved å svare med en spesiell feilkode når andre maskiner prøver å levere. Hvis avsendermaskinen er korrekt konfigurert, vil den vente en viss tid før den gjør et nytt forsøk på levering, og etter all sannsynlighet vil den lykkes etter kort tid. Det er verd å merke seg at dette er en del av standarden som først og fremst skulle sørge for at eposttjenesten skulle være så pålitelig som mulig, og i de fleste tilfeller skjer alt dette uten at personen som skrev og sendte meldingen merker noe til det. Meldingene kommer frem, og alle er fornøyde.
Grå og svarte lister, små hvite løgner
Grålisting går ut på at serveren gir melding om midlertidig feil til alle forsøk på epostlevering fra maskiner den ikke har hatt kontakt med før. Erfaringene viste at antakelsene fra før eksperimentet i hovedsak var korrekte: De aller fleste maskiner som sender legitim epost er satt opp til å sjekke returkoder og handle etter dem, mens de aller fleste avsendere av spam bare pøser på så mange meldinger som mulig, uten å sjekke returkoder. Resultatet er at et sted mellom åtti og noenognitti prosent av spam-mengden blir stoppet på første forsøk (før eventuell innholdsfiltrering), mens legitim post kommer frem, i noen tilfeller med en viss forsinkelse ved første kontakt.
En annen adferdsbasert teknikk, som faktisk var i bruk før grålisting ble utbredt, er å bruke såkalte svartlister - lister over maskiner som er blitt klassifisert som spam-avsendere - og avvise post fra maskiner som var med i listen. Noen grupper begynte etterhvert å eksperimentere med såkalte tjærehull (tarpit), der teknikken går ut på å forsinke trafikk fra maskiner som er med i en svartliste ved å la sin del av kommunikasjonen gå svært sakte. Et eksempel som ofte nevnes er programmet spamd, som det frie operativsystemet OpenBSD lanserte i mai 2003. Programmet hadde da som sin hovedoppgave å svare på eposttrafikk fra svartlistede maskiner med ett tegn i sekundet, uten at avsendere som var med i en svartliste hadde noen reell mulighet til å få levert meldingene.
Kombinasjonen av svartlister og grålisting har vist seg å fungere utmerket. Likevel fortsetter praktikere og utviklere forsøkene på å få til enda mer effektive teknikker. Enda en gang kom neste logiske steg som resultat av observert adferd. Vi så tidligere at spammere ikke bryr seg om å sjekke om hver enkelt melding faktisk kommer frem.
Legge ut snarer og agn
Tidlig i 2005 førte dette til at det ble det formulert en teori som det viste seg å være hold i: Hvis vi så lager en eller flere adresser i vårt eget domene som vi vet aldri vil ha noen grunn til å få legitim post, så kan vi med nesten hundre prosent sikkerhet vite at post som blir forsøkt levert til disse adressene er spam. Adressene er spammerfeller. Maskiner som prøver å levere spam til disse adressene, kan vi legge i en lokal svartliste som vi så oppholder med ett tegn i sekundet. Maskinene blir liggende i svartlisten i 24 timer dersom ikke annet tilsier det.
Den nye teknikken fikk navnet greytrapping, og ble lansert som del av en forbedret spamd i OpenBSD 3.8 i mai 2005. Bob Beck, som var sentral deltaker i utviklingen av spamd, annonserte tidlig i 2006 at han brukte greytrapping på sentralt plasserte maskiner ved University of Alberta, og gjorde svartlistene som er resultat av fangsten, med oppdateringer hver time, tilgjengelig for nedlasting slik at andre kan bruke listene i sine oppsett. Dette er åpenbart nyttig, maskiner som sender til adresser som aldri har vært leverbare, har etter all sannsynlighet ikke legitim epost å levere, og vi gjør samfunnet en tjeneste ved å hindre dem i leveringen og å få dem til å kaste bort så mye tid som mulig.
Det er kanskje verd å nevne at svartlisten fra University of Alberta så lenge undertegnede har fulgt den har inneholdt minst noenogtyvetusen IP-adresser, og har i travle perioder vært oppe i nesten tohundretusen maskiner.
Også du kan bidra
Likevel er det ikke nødvendig å være sentralt plassert programvareutvikler for å kunne bidra positivt. De samme verktøyene som Beck bruker til å generere sin svartliste er tilgjengelige for alle som del av OpenBSD, og er ikke så vanskelig å bruke som man kunne frykte.
Her på bsdly.net og beslektede domener observerte vi sommeren 2007 en markert økning i meldinger om ikke leverbar epost til epostadresser som aldri har eksistert i noen av våre domener. Her var det helt klart snakk om at noen, en eller flere grupper, genererte eller fant på avsenderadresser for å unngå å få reaksjoner på spammen sin tilbake til seg. Dette førte i sin tur til et eksperiment som vi fortsatt har gående. Vi registrerer ugyldige adresser i våre egne domener som dukker opp i loggene våre. Av disse adressene velger vi ut de helt usannsynlige, legger dem inn i vår lokale spammerfelle-liste og legger ut listen på en egen side på webserveren vår[2].
Erfaringene viser at det tar svært kort tid før adressene vi fører opp på denne siden dukker opp som mottakeradresser. Kort og godt har vi klart å fore spammere med data som gjør det enklere for oss å stoppe dem, og i mange tilfeller får vi i tillegg spamsenderne til å bruke betydelige mengder tid på å kommunisere med våre maskiner uten å oppnå noe som helst. Antallet spammerfelle-adresser i vår liste er nå oppe i rundt femtentusen, og vi har tidvis observert grupper av maskiner som bruker noen uker på å arbeide seg gjennom hele listen, med gjennomsnittlig noe under syv minutter brukt per mislykkede leveringsforsøk.
Som et biprodukt av denne aktive spammerfangingen begynte vi å eksportere vår egen liste over maskiner som har blitt fanget via spammerfelle-adressene de siste 24 timer og legge den ut tilgjengelig for nedlasting. At denne listen finnes, er kun annonsert via siden med spammerfelle-adressene og noen bloggposter, men vi ser at den blir hentet regelmessig og antakelig automatisk av andre som bruker den i sine systemer.
Så langt har vi etablert at det er mulig å lage et system som gjør at sannsynligheten for at spam kommer gjennom til brukere er svært liten, samtidig som det er nesten helt usannsynlig at legitim post blir merkbart hindret. Dermed har vi det som tilsvarer gode gjerder rundt egen eiendom, men spammerne finnes fortsatt der ute og er et potensielt alvorlig problem for de som ikke har tilstrekkelig beskyttelse mot dem.
Samle bevis, eller i det minste skape klarhet
Aller helst hadde vi ønsket at politi og påtalemyndigheter hadde tatt spammerproblemet alvorlig. Dette ikke bare fordi den spammen som kommer frem er irriterende å se, men fordi nesten all spam sendes via utstyr som spammerne bruker uten eiernes samtykke. Kort og godt ønsker vi at det blir satt inn ressurser som står i forhold til den kriminelle virksomheten spammen representerer. Vi ville gjerne hjelpe til, men i utgangspunktet kan det virke som vi kan ha problem med å skaffe til veie brukbart bevismateriale siden vi ikke mottar meldingene som spammerne forsøker å levere. På den annen side har vi til enhver tid en liste over maskiner som har prøvd å levere spam, noe nær hundre prosent sikkert identifisert på grunnlag av spammerfelle-adressene. I tillegg produserer systemene våre rutinemessig logger over all aktivitet, med det detaljnivået vi selv velger. Dermed går det an å søke i loggene etter IP-adressene som har forsøkt å levere spam til oss siste 24 timer, og få oversikt over hva maskinene har foretatt seg.
Resultatet av et typisk søk av denne typen ser slik ut:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3)
Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net>
Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds.
Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap
Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap
Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks
Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks
Den første linjen angir at 190.20.132.16 tar kontakt med vårt system klokken 02.34.29 om morgenen tiende august, som fjerde aktive forbindelse, derav tre svartlistede. Noen sekunder senere blir det klart at dette er et forsøk på å levere en melding til adressen asasaskosmicki@bsdly.net, som er blant de vi har som spammerfelle, sannsynligvis høstet fra logger og diktet opp et annet sted i verden. Etter 12 sekunder kobler denne maskinen fra. Forsøket på å levere til en spammerfelle gjør at maskinen blir oppført i vår lokale svartliste, spamd-greytrap, noe som vises klart når maskinen prøver igjen litt mer enn en time senere. På dette forsøket blir den oppholdt i 41 sekunder. Det tredje forsøket i vårt loggmateriale skjer like etter 06.30, og at listenavnet becks har kommet til, viser at maskinen i mellomtiden har forsøkt å levere til en av Bob Becks spammerfelle-adresser og nå er med i også den svartlisten.
Det er dessverre lite sannsynlig at slike logger er tilstrekkelige som bevismateriale i straffesaker, men for de som har interesse av enten å sørge for at maskinene de administrerer i så liten grad som mulig brukes til spamutsendelse eller de som har interesse av spammeradferd er dette nyttige data.
“Name And Shame”, eller kanskje bare godt naboskap?
Etter noen diskusjoner med kolleger bestemte jeg meg tidlig i august 2008 for å generere daglige oversikter over aktivitetene til maskiner som har kommet inn i vår lokale svartliste på bsdly.net og legge dem ut offentlig. Når en maskin er oppført i en svartliste med bare IP-adresse (for eksempel 24.165.4.190), uten annet materiale om oppføringen, er oppføringen mest en påstand som de ansvarlige godt kan velge å ikke tro på. Vårt håp er at om den som er ansvarlig for nettverket der 24.165.4.190 hører hjemme ser en sekvens som denne,
Host 24.165.4.190:
Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8)
Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6)
Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org>
Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds.
Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap
Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org>
Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com>
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org
Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood.
Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap
så er det tilstrekkelig grunnlag for å gjøre noe aktivt. Materialet er tilgjengelig via The Name And Shame Robot-siden på http://www.bsdly.net/~peter/nameandshame.html. Siste genererte loggoversikt er tilgjengelig via referanser på den siden, tidligere utgaver blir arkivert, men vil være tilgjengelige ved godt begrunnet forespørsel.
The Name and Shame Robot er såpass ny at vi ikke kan si spesielt mye om effekten av offentliggjøringen. Det er lov å håpe på at andre vil gjøre noe tilsvarende ut fra sine lokale loggdata eller kanskje til og med synkronisere sine data med våre. Ta gjerne kontakt hvis du er interessert i dette arbeidet.
Uavhengig av alt annet håper vi at dataene kan være nyttige, både som påpeking av forbedringspotensiale for de nettverkene som opptrer jevnlig i oversiktene og som materiale for studier som kan gi oss enda bedre spambekjempelse.
Noter
[1] En samling av slike returmeldinger fra tidligere i år kan beskues på http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt
[2] http://www.bsdly.net/~peter/traplist.shtml, referanser på den siden fører videre til bloggen min som jeg bruker til offentlige notater, og annet relevant materiale.
En forkortet utgave av denne artikkelen ble trykt i Computerworld Norge 22. august 2008.
Wednesday, August 27, 2008
Logfiles in the buff
Search engine optimization, deflowered.
Logs are important. Depending on the specific kind of log, the data may shape lives and generate fortunes (how many times were those ads displayed, your clickthrough rate), reveal suspicious behavior and trigger actions (such as shutting the door to that bruteforcer) or provide sysadmins such as yours truly a general idea of what works and not or anything inbetween.
If you're a sysadmin, log data or log data derivatives such as a monitoring tool's graphical status display is more likely than not an important underlying factor to determine how you spend your day.
Then of course most of the material for these columns comes from log files, too. Depending on the specific log file, I tend to either just peek at the data my monitoring scripts offer me or do some manual greping for any patterns that interest me.
One such pattern matches the filename for my resume. I put that online for job hunting purposes, and now that I'm basically a gun for hire, it's slightly interesting to see any activity involving that file.
So at semi-random intervals, I check the apache log for references to my resume. Today, the grepery turned up this nugget
92.48.107.33 - - [27/Aug/2008:04:41:12 +0200] "GET /%7Epeter/PNMH-cv.html HTTP/1.0" 200 12318 "http://afmfokuv.fcpages.com/hot-anime-lesbians.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
and I count myself lucky that I had thoroughly swallowed my last mouthful of coffee before reading that.
In the Era of PageRank, the Age Search Engine Optimization Consultant, Season of the Clickthrough rage, I suppose we should not be entirely surprised to see such things. Just what the two documents have in common, perhaps other than targeting a very specific market, is left as an excercise for the terminally curious. I would advise some caution in choice of browser and operating system if your research takes you to the referring URL. One of the lessons of the day is, it doesn't always take a spamd log to crack you up.
PF tutorial in London, November 26
In other news, the UKUUG are hosting a full day PF tutorial featuring yours truly in London on November 26th, 2008. See the UKUUG web site for details. OpenCON is the following weekend in Venice, and I hope to make it there too.
The Name and Shame Robot
Last week the Norwegian edition of Computerworld published an article about the Name and Shame Robot, unfortunately in the paper edition only (yes, I've got an English article in process too). The article did spur some nameandshame.html traffic from unexpected places, but no offers of cooperation or spamd synchronization so far. In the meantime, I'm running into odd cron behavior differences when trying to run the generator script I wrote on my OpenBSD machines on a few FreeBSD hosts. More than likely there is a lesson to be learned there too.
Logs are important. Depending on the specific kind of log, the data may shape lives and generate fortunes (how many times were those ads displayed, your clickthrough rate), reveal suspicious behavior and trigger actions (such as shutting the door to that bruteforcer) or provide sysadmins such as yours truly a general idea of what works and not or anything inbetween.
If you're a sysadmin, log data or log data derivatives such as a monitoring tool's graphical status display is more likely than not an important underlying factor to determine how you spend your day.
Then of course most of the material for these columns comes from log files, too. Depending on the specific log file, I tend to either just peek at the data my monitoring scripts offer me or do some manual greping for any patterns that interest me.
One such pattern matches the filename for my resume. I put that online for job hunting purposes, and now that I'm basically a gun for hire, it's slightly interesting to see any activity involving that file.
So at semi-random intervals, I check the apache log for references to my resume. Today, the grepery turned up this nugget
92.48.107.33 - - [27/Aug/2008:04:41:12 +0200] "GET /%7Epeter/PNMH-cv.html HTTP/1.0" 200 12318 "http://afmfokuv.fcpages.com/hot-anime-lesbians.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
and I count myself lucky that I had thoroughly swallowed my last mouthful of coffee before reading that.
In the Era of PageRank, the Age Search Engine Optimization Consultant, Season of the Clickthrough rage, I suppose we should not be entirely surprised to see such things. Just what the two documents have in common, perhaps other than targeting a very specific market, is left as an excercise for the terminally curious. I would advise some caution in choice of browser and operating system if your research takes you to the referring URL. One of the lessons of the day is, it doesn't always take a spamd log to crack you up.
PF tutorial in London, November 26
In other news, the UKUUG are hosting a full day PF tutorial featuring yours truly in London on November 26th, 2008. See the UKUUG web site for details. OpenCON is the following weekend in Venice, and I hope to make it there too.
The Name and Shame Robot
Last week the Norwegian edition of Computerworld published an article about the Name and Shame Robot, unfortunately in the paper edition only (yes, I've got an English article in process too). The article did spur some nameandshame.html traffic from unexpected places, but no offers of cooperation or spamd synchronization so far. In the meantime, I'm running into odd cron behavior differences when trying to run the generator script I wrote on my OpenBSD machines on a few FreeBSD hosts. More than likely there is a lesson to be learned there too.
Subscribe to:
Posts (Atom)