Thursday, December 27, 2007

A year ends; what to do next?

It's the end of a year already. The end of the year is among other things the traditional time for tallying up totals to see what the year brought and looking forward to the fresh year ahead. Now at this point the inner geek in me and probably you too rebels with Why should any arbitrarily chosen point in time be assigned that much significance, huh?, but let's face it, it's one convention we will just have to live with.

The past year included a number of events, some entirely expected like the formal beginning of the end of the corporation once known as Caldera (yes, I know, it's not quite over yet, and I've written about that earlier), some rather surprising like the recent EU brokered patents and specifications deal which apparently means that the Samba team and other interested parties will not only be given access to usable protocol specs, they will even be furnished with a list of what Microsoft believes to be their relevant patents. That at least puts a serious dent in that corporation's patent FUD capability.

Any pundit in the Microsoft/Linux/FOSS "watcher" crowd who left that one out of their year end summary pieces should consider themselves cautioned: You were not paying attention to what could be this year's most important single piece of news in our field.

The general picture of the IT field is rather one of vast crowds of users who simply want to get on with their lives. The typical user is weary of the seventeen and a half times a week ritual Microsoft malware scare, and doesn't really see any benefit in getting a new computer with Vista to slow it down, now that they've finally weeded out or gotten used to all the annoyances of Windows XP and the background noise of unwanted popups and spam.

Rather more depressingly for us in the FOSS field, the typical user wants to just get on with his or her life and is weary, too, of the constantly overhyped "solutions" IT types are peddling. Faced with < insert your favorite product selling point here > , the stock answer now is, "gimme a break, that's what the last one said too".

This goes for almost any selling point, including vastly improved security along any measurable axis, efficient spam killing (including avoidance techniques like greylisting), the lightweight while useable desktop, and during the past year Microsoft even made a credible attempt at taking "open standards" prisoner. There is clearly a lot of work to be done, and we need to find ways to do that work better and present it in ways that actually add to FOSS people's credibility.

That includes, in my view, finding better ways to handle the periodic squabbles over licenses such as the GPL vs BSD shouting matches. It is likely that I will return to that topic in a future column, if and when I find the time to write it properly.

In my own little corner of the world, the publication of The Book of PF, marked here by the arrival of the author copies, marked the end of a long process that consumed rather more time and resources than I had anticipated. Before those copies arrived I had some copies made for OpenCon which were auctioned off for amazing sums that were subsequently donated to the OpenBSD project (see undeadly.org for details). Even though Amazon.com now lists the book as due for release January 11th, I have confirmation that No Starch shipped all preorders before they closed for the holidays, and I know at least one correspondent who got a message from the UK arm of Amazon that his copy was on its way. I'm interested in hearing from you about the book, of course, even reports that it has arrived safely in your mailbox.

Now other opportunities beckon, and I promise that in the coming year I will be writing about developments, confidentiality agreements allowing. If there is anything specific you want me to write about, please let me know.

I give you all my best wishes for the new year.

PS I almost neglected to mention that the PF tutorial (the forerunner of the Book of PF) saw its visitor (unique IP address or host name) number 27,000 for the period we have log data for on December 24th.

Update 2015-04-02: The Book of PF is now in its third edition, and the link in this article has been changed to point to the more recent edition.

Sunday, November 25, 2007

I Must Be Living in a Parallel Universe, Then

It's Sunday morning, and I'm having my morning coffee while getting ready for a long session of editing my OpenCON presentation. By working on adapting the presentation tailored to the tutorial I've been rediscovering just how much work went into making the book, so a long Sunday session is needed, if not more.

Then courtesy of Groklaw's news picks comes the USA today piece called Despite filters, tidal wave of spam bears down on e-mailers.

A tidal wave of spam, no less. Well, we're seeing a lot of attempts at sending, like the sequence here (text link, formatting it would take too long) that I captured from the xterm running a tail -f on my spamd log a little while back. That sequence tells me, for one thing, that the naive spambot thinks my spamd looks like an open relay.

The other interesting thing about the sequence there is the pattern you can see in the From: addresses. It may have dawned on some of the spammers that generating random addresses in other people's domains might end up poisoning their own well, so they started introducing patterns to be able to weed out their own made up addresses from their lists. I take that as a confirmation that our harvesting and republishing efforts here and elsewhere have been working rather well.

Here the method seems to be that they take the victim domain name, prepend "dw" and append "m" to make up the local part and then append the domain, so starting from sia.com we get dwsiam@sia.com.

There is one other common variation on that theme, where the prepend string is "lin" and the append string is "met", producing addresses like linhrimet@hri.de, used just a few minutes ago to try to spam malseeinvmk@bsdly.net from the apparently Polish adress 89.228.40.80. This is of course very interesting, as is the fact that right now about two and a half thousand machines are in my spamd-greytrap list . That's where they end up, making no waves at all.

On the subject of patterns, earlier this month the address capitalgain02@gmail.com started appearing frequently enough that it caught my attention in my greylist dumps and log files.

The earliest contact as far as I can see was at Nov 10 14:30:57, trying to spam wkzp0jq0n6.fsf@datadok.no from 193.252.22.241 (apparently a France Telecom customer). The last attempt seems to have been ten days later, at Nov 20 15:20:31, from the Swedish machine 217.10.96.36.

My logs show me that during that period 6531 attempts had been made to deliver mail from capitalgain02@gmail.com via bsdly.net, from 35 different IP addresses, to 131 different recipients in our domains. Those recipients included three deliverable addresses, mine or aliases I receive mail for. None of those attempts actually succeeded, of course. With a little more time on my hands I'm sure I could have made a good regular expression to calculate to the second how much time those spam senders wasted here, too.

So where's the tidal wave? Back when PDF spam was the new horror, it actually took three weeks for one to reach me, and then only via an alias on a machine I really don't have much control over anymore. The number of spam sending machines does seem to be increasing, though.

Bob Beck's uatraps list is a good indicator, and the tendency is clear from the graph in my malware paper. The number did dip just below 100,000 addresses earlier this month, and it now seems to have stabilized in the 110,000 to 120,000 range.

From my perspective, it looks like a reasonably configured spamd is really all we need to observe the tidal wave at a safe distance and have fun all the while.

It's almost like living in a parallel universe.

Sunday, October 28, 2007

Of Course, It Had To Be A Webshield

In an earlier blog post, I mentioned that I would buy a round of drinks the first time I saw an attempt to deliver a message with both the From: and To: addresses already on my spammer baiting list.

In fact it happened very soon afterwards, and as luck, misfortune or just plain old incompetence would have it, that message apparently came from a WebShield appliance not too far from here:

Oct 17 23:03:52 skapet spamd[20795]: 194.54.96.18: connected (6/4)
Oct 17 23:04:03 skapet spamd[20795]: (GREY) 194.54.96.18:
<capitulations7@datadok.no> -> <capitulations7@datadok.no>
Oct 17 23:04:03 skapet spamd[20795]: 194.54.96.18: disconnected
after 11 seconds.
Oct 17 23:19:21 skapet spamd[20795]: 194.54.96.18: connected (4/3)
Oct 17 23:19:32 skapet spamd[20795]: (GREY) 194.54.96.18:
<capitulations7@datadok.no> -> <capitulations7@datadok.no>
Oct 17 23:19:32 skapet spamd[20795]: 194.54.96.18: disconnected
after 11 seconds.
Oct 17 23:30:30 skapet spamd[20795]: 194.54.96.18: connected (4/4),
lists: spamd-greytrap
Oct 17 23:34:14 skapet spamd[20795]: (BLACK) 194.54.96.18:
<capitulations7@datadok.no> -> <capitulations7@datadok.no>
Oct 17 23:35:58 skapet spamd[20795]: 194.54.96.18: From:
Webshield.SMTP.V4.5.MR1a.Mail.Service@vs4.bgnett.no
Oct 17 23:35:58 skapet spamd[20795]: 194.54.96.18:
To: <capitulations7@datadok.no>
Oct 17 23:35:58 skapet spamd[20795]: 194.54.96.18:
Subject: Returned Mail: Error During Delivery
Oct 17 23:37:00 skapet spamd[20795]: 194.54.96.18:
disconnected after 390 seconds. lists: spamd-greytrap
Oct 17 23:57:18 skapet spamd[20795]: 194.54.96.18:
connected (6/6), lists: spamd-greytrap


I sent the operators at that site a polite message right away, pointing out the misconfiguration. Two weeks later I have seen no response other than the automatic acknowledgement, but it looks like the machine has managed to get itself automatically whitelisted in the meantime. So perhaps they found the button that actually does something.

Since my last blog post I have completed the book, and I expect the last bit of proofing to be done during the coming week. Then a few other necessary processes, and physical copies available for mid December if all goes well. With the cover in place, it looks like it's become attractive and popular over at amazon.com in its various categories. The BSD category there looks pretty No Starch dominated at the moment.

That can not be a bad thing. It's been a real pleasure working with the people at No Starch Press. If you think you want write a tech book, they should be on the list of publishers to contact with your proposal.

While all this was happening, the spammer baiting operation seems to have reached a critical mass of sorts. With roughly 7,200 addresses in the spamtrap list there are several hundred bait addresses for each real one in those domains taken together, so it's extremely unlikely that the spammers will ever get a chance to try delivery to a real address before they hit the tar pit. Over the last couple of weeks, my gateways have had anywhere between 2,500 and 4,000 hosts in the local spamd-greytrap, and anywhere from 0 to about 300 spambots pushing bytes into the tar pits at any time. It's fun to watch (some of the bots labor through the bait list from top to bottom), and the net effect is, well, we're not seeing much spam.

I think I've mentioned it before, but it bears repeating: To naive spammers and the tools they use, spamd looks like an open relay. Spamd never actually delivers any messages, but this


GREY|201.250.57.147|sofia|<vdaegkoxgk@bonana.com>|
<brad.james.anderson@jhg.com.au>|1193105605|1193127205|1193127205|1|0


says that whoever operates 201.250.57.147 (according to whois, likely located in or near Buenos Aires, Argentina), is unable to tell the difference between an open relay and spamd's 451 and subsequent "this is going to hurt you more than it hurts me" messages.

Another variation on that theme is what I think is some sort of amateurish relay testing, which typically produces anywhere from five hundred to a thousand greylist entries of the type


GREY|59.35.4.51|UATIM-F7E7949C7|<adgjnq@194.54.103.104>|
<ariel5268@yahoo.com.tw>|1193084672|1193113472|1193113472|2|0
GREY|59.35.4.51|UATIM-F7E7949C7|<xaehkn@rosalita.datadok.no>|
<ariel5268@yahoo.com.tw>|1193084675|1193113475|1193113475|2|0
GREY|59.35.4.51|UATIM-F7E7949C7|<qswyd@brutha.datadok.no>|
<ariel5268@yahoo.com.tw>|1193084691|1193113491|1193113491|2|0
GREY|59.35.4.51|UATIM-F7E7949C7|<nqtw@monalisa.datadok.no>|
<ariel5268@yahoo.com.tw>|1193084733|1193113533|1193113533|2|0


where the From parts are made up of host names and IP addresses in our local net, including in this case, the host name for one of our laser printers. Those floods have tended to swell the bait list a bit, even if I strip out the invalid @<IP address> ones.

Spamd makes the naive relay testers think we have a whole network of open relays, and we harvest the noise they generate to lead the spambots to the tarpit. That's pretty close to a hands-off spammer repellent for us, and a serious auto-LART for the spammers.

OpenCON is sneaking up on us in a month's time, and we're heading for Venice with a refreshed tutorial session. See you there!

PS - [non-IT PS coming up] Bergen's football (soccer) team SK Brann has just won the national league for the first time in 44 years. With one game to go before end of season they are so far ahead in points there is no way any other team will be able to catch up. The town is predictably going gaga over the event, and we joined the thousands at the central Festplassen square for the city sponsored celebration tonight. I'm surprised how many songs have been written about that team and how everybody around me seened to know every last word of the lyrics. Good fun, ending with fireworks.

Saturday, September 29, 2007

Always a pleasure to be wasting your time, guv

This week has been a little unusual around the BSDly household. So far I've generally been doing my regular job in the daytime (with longish office hours), only working on the book evenings and weekends. That the arrangement would lead to "Exhaustion is my middle name" status was obvious to everyone except me, but I finally saw where it could be going. So for a little more than the past week I've been working on the book full time.

The state of perpetual exhaustion has had some not too happy consequences. Of course the general progress on the book suffered, but it also lead to me missing the monthly BLUG meeting in August. Of course much of that particular day I had spent persuading somebody not too bright that it indeed had to be a reconfiguration they said had never happend at their end which ended up breaking things at our end, and I was just too tired and missed what I assume was a well executed lecture on networking basics by Vegard Engen (of RFC1149 implementation fame).

This week with only one job I needed to tackle, I was there for an enjoyable one and a half hours of Bacula, well presented by Bård Aase (aka elzapp). Off to Henrik (the regular BLUG pub) for a few beers afterwards, and with Johan Riise volunteering to put together a 'Unix and time' lecture for next month, the BLUG calender seems to be in order after all, with Jill Walker doing the end of semester talk in November, on whatever interesting stuff she has been up to lately. Unfortunately it looks like the last Thursday of November is close enough to OpenCON that I'll likely miss Jill's session.

In the meantime, there are signs that the greytrapping and my bait list is working. Looking over the spamd logs today I found quite a few entries like these:

Sep 29 15:29:23 skapet spamd[20795]: (BLACK) 84.76.177.159: 
<royaleuromillion2007@yahoo.es> -> <211hgsreliart7@datadok.no>
Sep 29 15:29:32 skapet spamd[20795]: (BLACK) 84.76.177.159: 
<royaleuromillion2007@yahoo.es> -> <00b27f18@datadok.no>

which looks strikingly like the Spanish lottery scam spammers patiently and methodically working their way through my list of bait addresses, all the way from top to bottom, at roughly 3000 addresses it's going be a while. All I can say is, we are extremely pleased to be wasting your time, senor.

Also while the girls were off to the Raptus comics festival (an annual event, and one of the big things here in Bergen), I found enough trash backscatter to non-existent bsdly.net addresses that it's likely that the same weekend spambot operators who spewed their spam with @ehtrib.org and @skapet.datadok.no addresses earlier (both times at weekends) have now discovered bsdly.net and are doing their damnedest.

Why they prefer to generate a few hundred fake addresses and use them all in one go is beyond me. The other groups seem to generate only a handful of new addresses each every day, and for good measure at least one of them sort of reuse the generated addresses by using a forward and a reverse (such as in this morning's preserved greylist dumps, there was a potterv76@datadok.no as well as the reverse 67VRETTOP3@datadok.no). This lot just dumps all they have in one go, mainly contributing to swelling that file in my home directory with the totally unprintable file name which is the temporary storage before they go to into the traplist and on to the bait page.

Distractions of that kind from my main task is never entirely welcome, but with a larger influx of new addresses to be added to the bait list I made some small changes to make the maintenance of that page a bit more sane, rediscovering server-side includes and redirects along the
way. And the data I keep collecting may become the basis for other projects later.

Anyway, it is increasingly clear that the spammers are including the generated fake addresses in their "known good" lists. Consider the spambot at 210.111.190.216 (apparently in Korea), which insists on delivering to an address somebody generated in early July:

peter@skapet:~/www_sider$ grep  210.111.190.216 /var/log/spamd
Sep 29 15:58:07 skapet spamd[20795]: 210.111.190.216: 
connected (5/4)
Sep 29 15:58:21 skapet spamd[20795]: (GREY) 210.111.190.216: 
<jim.vance@presentsmadeeasy.com> -> 
<careersogt2083@datadok.no>
Sep 29 15:58:22 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:58:35 skapet spamd[20795]: 210.111.190.216: 
onnected (4/3)
Sep 29 15:58:49 skapet spamd[20795]: (GREY) 210.111.190.216: 
<tbaker@groupecdb.com> -> 
lt;careersogt2083@datadok.no>
Sep 29 15:58:50 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:59:03 skapet spamd[20795]: 210.111.190.216: 
connected (5/3)
Sep 29 15:59:17 skapet spamd[20795]: (GREY) 210.111.190.216: 
<wotan@4vsi.com> -> <careersogt2083@datadok.no>
Sep 29 15:59:18 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:59:30 skapet spamd[20795]: 210.111.190.216: 
connected (6/5), lists: spamd-greytrap
Sep 29 16:03:14 skapet spamd[20795]: (BLACK) 210.111.190.216: 
<sylviacastleman@alltypecalligraphy.com> -> 
<careersogt2083@datadok.no>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
From: "Marguerite Casey" <sylviacastleman@alltypecalligraphy.com>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
To: <careersogt2083@datadok.no>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
Subject: 100mg x 60 pills US $ 129.95 buy now
Sep 29 16:06:04 skapet spamd[20795]: 210.111.190.216: 
disconnected after 394 seconds. lists: spamd-greytrap

I have no real opinion on the validity of the From: addresses, but the address they are trying their best to deliver spam to here never actually existed, of course. The first record of it at datadok.no was this bounce from a Russian site:

Jul 12 23:38:52 delilah spamd[29851]: (GREY) 81.177.34.190: 
<> -> <careersogt2083@datadok.no>

Dumping their trash back at them is good for a laugh, and I am quite amazed how shortsighted the spambot operators appear to be. They get yelled at for spamming, so to avoid detection, they start using fake addresses. This in turn means they have no feedback whatsoever on the quality of their address lists, and with well pissers like me in action, they are getting less effectitive each day, reducing themselves to background noise in the network.

Now with this blog post done I will go back and finish the edits on the logs chapter. With the early parts of the book about to enter the layout phase while the last bits get written over the next few days, there is a chance that there will be a physical copies of the book to pass around at OpenCON. Not quite there yet, but the fulltime push is certainly helping. The preface with a list of thanks is part of what is entering layout; I think a few people who did not expect to be in there will soon have a pleasant surprise.

Also this week, the PF tutorial saw its unique visitor number 19,000 since EuroBSDCon 2006 on Thursday morning (September 27th). We certainly hope at least some of them will come back for the book.

Friday, September 21, 2007

The Great SCO Swindle Winding Down, But Will They All Get Away With It?

Poor Dan Lyons. He thought like a bookmaker and wrote what he thought was right.
You see, a few years back, when Caldera was still Caldera, that company had successfully sued a large corporation and won. Then Caldera changed its name to SCO and sued another huge corporation. Dan the bookie thought it was a sure bet, and started cheering them on. Four years on, the sure bet went south on a technicality. They did not actually own the code they had accused others of stealing. At least that's the way I read his Snowed by SCO article over at Forbes.

My take on this is, Dan, you only had to look at the facts. Knowing a bit of IT history is also a plus. When the SCOX matter came up, I like most people thought that you can never rule out the possibility that some code might have been copied. After all, Unix source code was never particularly hard to get you hands on and was widely used as classroom examples all over the world.

Then if that code was just identified, it would be ripped out and replaced. It's happened before. In the free software world, whole subsystems get replaced when there's a good reason to, and if the reason is copyright violation it gets somewhat urgent. The problem is, in the SCO matter, no code was ever identified.

Some journalists went through an elaborate procedure involving non-disclosure agreements and were, we are told, showed code from Linux and somewhere else which showed remarkable similarity. When Darl McBride used the SCOForum 2003 conference to show something he passed off as ripped off code, it took only hours to identify the exact chunks through the obfuscation (yep, formatting comments in the Symbol font) and the code proved to be irrelevant.

None of these events helped convince techies of their claims, but for me the tipping point was when they claimed to have a reason to sue the BSDs as well. Anyone who had been paying any attention at all to Unix history knew that the ATT vs BSD lawsuit was finally settled in 1994, with most of the terms sealed, but one of the few things that was made public was that the parties had forfeited any right to sue each other over the Unix code base. To me and quite a few others, this was proof positive that they were 'misguided or dishonest', as a commentator put it at the time.

One of my favorite summaries of the facts of the case was written by Greg Lehey (of The Complete FreeBSD fame), who looked at the various announcements from the technical side. He stopped maintaining it after a while, but it's still there at his website, with as far as I tested with all links intact.

Most people seem to be relieved that the matter seems to be over. I beg to differ.

For one thing, the main characteristic of this matter has been the amazing ability of the SCO crowd to drag out the proceedings over irrelevant, mainly procedural matters. They will have more tricks up their sleeves, for certain.

The other thing is, with Dan's friends out on the technicality that they did in fact not have the legal standing to sue, we will never get that detailed walkthrough of the code where Darl and his covert experts are supposed to point out the infringing code. I, for one would have looked forward to that. Then we would have had a chance of getting to know their real motivation too, and possibly some solid leads on the planning and funding. Now that will just not happen.

Then of course there's the stockholder lawsuits and possibly the FTC. If you were one of those chumps who bought SCOX stock at roughly twenty dollars a share based on Dan Lyons' recommendations, wouldn't you feel a little sore now that your investment is about a cent to your original dollar? That is, if you can unload it before SCOX are finally kicked out of NASDAQ for good?

So poor Dan Lyons for not seeing this coming. And damn the technicalities for cancelling the main event.

For those of you eager for news of the book, we're working hard to get it out there.

Update 2007-09-25: Another non-apology, this one from Rob Enderle.

According to linuxtoday, Rob Enderle claims he was tricked by (wait for it) both SCOX and those ever-bullying Linux people.

Actually, there's not much to see there, You can read it as just another non-apologizing apology, with some tall tales about death threats and DOS attacks thrown in (yes, really).

As I've said a few times earlier, enough facts were on the table right from the start of this timewasting story to show that more than likely the SCOX crowd did in fact not have a case.

Now I wonder what, if anything we will be hearing from
John Parkinson, who wrote in CIO:

"a lot of the intellectual property in Linux is actually owned by companies that never officially agreed to make it available under an open-source license."

Interestingly enough, that came without any qualification at all.

That irritated me enough at the time that I wrote to them (pasted into some inane feedback form):

Alleged intellectual property theft

In the article called "The End of Idealism" (http://www.cio.com/archive/070103/et_pundit.html), John Parkinson writes, "a lot of the intellectual property in Linux is actually owned by companies that never officially agreed to make it available under an open-source license."

Please take a moment to consider the seriousness of this allegation. What Parkinson actually says here is, "large parts of Linux consist of stolen property".

Reading such allegations in an article written by a senior executive of Cap Gemini Ernst & Young is quite shocking in itself.

It is only reasonable that Mr Parkinson or Cap Gemini Ernst & Young specify which parts of the Linux kernel they consider to consist of stolen property.

All versions of the Linux kernel, along with detailed change logs and archives of the developer mailing lists are available to the public. Using these resources, all parts of the code base can be traced to the individual who submitted them for inclusion.

In other words, it is quite easy to pinpoint who did what, and Mr Parkinson and Cap Gemini Ernst & Young would be doing the public a great disservice by refusing to help point out code which was illegally included in the open source operating system.

Quite a few articles, well informed and otherwise, have been written about the SCO vs IBM lawsuit and SCO's allegations. I suggest interested readers browse FreeBSD Core member Greg Lehey's overview at http://www.lemis.com/grog/SCO/index.html while we wait for more details from Mr Parkinson or Cap Gemini Ernst & Young.





Slashdot Slashdot It!

Monday, September 17, 2007

EuroBSDCon was great, disks dying and some scary Windows stuff

This Monday finds me safely back from EuroBSDCon and trying to do useful things while the file server gets restored.

Of course it had to be that way. With me off to EuroBSDCon to do the tutorial and other refreshing geekiness, in the first batch of mail I retrieved after arriving in Copenhagen was a log summary from the machine which holds pretty much everything datadok is working on at any time, with these nuggets:

> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410884031
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410912703
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410884575
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410905887
> > ad6: FAILURE - READ_DMA status=51 error=40 LBA=410857151
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=446104667
> > ad6: TIMEOUT - READ_DMA retrying (1 retry left) LBA=446104667
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=522840603

ouch.

This is what disks say when they've run out of space to map bad sectors into. The disk wasn't quite dead yet, but definitely time to plan a replacement. Not much to be done about that right away except alert the colleagues that there would be file server downtime on the Monday afternoon. Disks will die, and sysadmins end up with the task of replacing them.

My brief summary of EuroBSDCon is that it was an excellent conference, lots of good talks, interesting people to see and in a good, clean location with a network connectivity which worked, most of the time. update: finally my eurobsdcon pictures are on flickr

For my own part the PF tutorial went reasonably well, with 24 people signed up and I think one or two sit-ins. People were paying attention and there were a few good questions which made the session more interesting with a little more improv than the last few times I did this tutorial. Answers were had, though, and I believe a good time with useful info for the people who had signed up for the session. Not too many hours after we were done, the number of unique visitors (aka host names or addresses) to the tutorial tree since last EuroBSDCon rolled past 18,000.

After lunch Marco Zec's session about virtualizing the FreeBSD network stack was really interesting. Unfortunately none of the Thinkpads present were able to boot from the FreeBSD-current image Marco had prepared and supplied on USB thumbdrives, actually producing pretty much the same crash (illustrated here). But a very interesting topic and session. I'm glad I stuck around for it.

The Wednesday I had the choice of sightseeing, sitting in on Kirk's session and holing up in the hostel basement's hacker room to get some writing done, and I ended up going for the latter option, getting significant parts of the logging chapter done. There is of course a limit to how long you will avoid interruption in a semi-public area, but that session was certainly useful.

The EuroBSDCon hacker area with both wired and wireless networks was available to conference attendees all conference and tutorial days. Naturally it took on a social function in addition to being a convenient way to surf and fetch your email.

For the conference itself, it was sometimes hard to choose which talks to go to. I still think Ike's jails talk (pix here, here, here) was my favorite (similar but not identical to the one he gave at AsiaBSDCon in Tokyo), but there were a lot of good ones. I ended up managing to miss Pierre-Yves Ritschard's Load Balancing talk since they'd switched the schedule around. I hope there's a chance to pick up the essentials at some later date.

Fortunately Wim and Machtelt turned up to organize the OpenBSD booth (convenient for restocking your clothes cupboard) and some news about OpenCON - there will be an OpenCon 2007, but there's still some organizing to do. I hope to be seeing you there, Venice November 30th through December 2nd.

From the Windows Is Scary department, one episode from a few weeks back which I suddenly remembered when I realized the guy quietly hacking to the left of me was FreeBSD USB guru Hans Petter Selasky:

When I saw 4GB USB thumb drives priced at just under NOK 300 (USD 55), I decided I needed one. The drive mounted with no trouble at all in in OpenBSD (mount /dev/sd1i at the location of your choice), and I thought good, I'll just delete those .exe files to make room. A few days later I needed to retrieve som files which turned out were most easily accessible from my Windows machine at work. So I plugged in the new 4GB thumb drive.

Windows machines always do strange things and take a while to recognize new hardware, but this time it claimed to have found a new CD drive. A few confusing minutes later, with various message boxes flashing across the screen, the machine begged for a reboot. I let it have that, slightly puzzled but not entirely surprised that Windows wanted the user to jump through a few extra hoops to make something work.

I was able to retrieve the files eventually, while trying to avoid yet another quirky Windows application which wanted to handle my files. As it turns out, the device actually emulates a CD drive as well as USB mass storage. Here's what it looks like in /var/log/messages on my OpenBSD laptop:

Sep 17 22:23:23 thingy /bsd: umass0: SanDisk Corporation U3
Cruzer Micro, rev 2.00/0.10, addr 2
Sep 17 22:23:23 thingy /bsd: umass0: using SCSI over Bulk-Only
Sep 17 22:23:23 thingy /bsd: scsibus2 at umass0: 2 targets
Sep 17 22:23:23 thingy /bsd: sd1 at scsibus2 targ 1 lun 0:
SCSI2 0/direct removable
Sep 17 22:23:23 thingy /bsd: sd1: 3913MB, 498 cyl, 255 head,
63 sec, 512 bytes/sec, 8015502 sec total
Sep 17 22:23:23 thingy /bsd: cd1 at scsibus2 targ 1 lun 1:
SCSI2 5/cdrom removable

The reason all the strange and scary things happened with the Windows machine is that the emulated CD contains Windows Autorun files, which it seems there is no easy way to turn off or is at least enabled by default in that operating system. What I find slightly disturbing is that, as Hans Petter explained, this behavior is part of the device's firmware and you can't get rid of that five or six megabytes of useless software in these devices. The best you can do is use a system which ignores such silliness.

Returning to the file server, the box is a few years old and has by now probably had most of the original components replaced. The last time we replaced the motherboard, we were still thinking that SCSI was the only way to go for storage, disks and tape both. Not too long after that, we decided that actually SATA was OK for that little office of ours, but when the time came to replace that disk, I discovered that actually the motherboard had only two SATA ports on it, one for the system disk and one for the dying data disk. So copying across from one SATA disk to another had to be done via Ethernet instead. Fortunately installing a useful operating system takes only about twenty minutes, and the some tens of gigabytes transferred while I was writing this article. Far faster than restoring the same data via rsync from our offline backup, though.

Among the things announced in Copenhagen were that there will be an AsiaBSDCon in March 2008, NYCBSDCon will maybe be next year in the fall, and the next EuroBSDCon will be in Strasbourg. I hope to be at several of those, time and money allowing. But now on to finish that book.

Saturday, September 8, 2007

Wanna help science? Study your greylists' innards!

If somebody, say five years ago, had told me that I would be spending a little time, every day, studying data about what invalid addresses some unknown miscreants are making up in my domains, I would have thought them to be slighly off their rockers.

Yet here I am, actually maintaining a publicly available list of addresses which do not stand a chance of becoming valid, ever. It all started with a log data anomaly - I noticed an increase in the number of failed delivery messages to non-existent addresses in our domains. I had expected that the bounces to invalid addresses would appear for a short period only, but for one reason or the other it looks like it's here to stay, with some dips and peaks like the ehtrib.org flood.

The list is apparently working as intended too. These addresses are on my local greytrap list, and I have started seeing addresses I put in there as all uppercase turn up in my logs in all lowercase variants. Fun to watch, sort of.

Anyway, the supply of new bogus addresses proved to be larger than I had expected. So to get a handle on just what is happening I ended up doing periodic dumps of the live greylist data. This is really easy to do if you're using spamd as your greylister, your basic command is

$ sudo spamdb | grep GREY

and you redirect to a file, pipe to mail, or whatever you like.

Now if you're a bit like me, looking for patterns in the noise like this makes you feel a little weirder than usual and possibly lead you to think of a Clive Barker novel (specifically the bits about the dead letter file in The Great and Secret Show) and you wonder why this is worth doing at all. After all there is precious little spam that actuall reaches my users, so like I said earlier, for us spamd users it really looks like spam is a solved problem. I guess I'm just a bit fascinated by the pure irrationality of the spammers' behavior.

From the data I collect here in my tiny corner of the world to browse when time allows there may be useful information lurking somewhere.

Typical entries show things like the host 202.152.33.43 tried to send with a From: address jcejft@charter.com to dkqvujfn@datadok.no and sdenuuu@datadok.no. Using a few common networking commands we see that there is no reason why charter.com email should come from the IP range belonging to idola.net.id, and as the admin of datadok.no I know these two addresses have never been deliverable. Most likely the admin at charter.com can tell you if that from address is deliverable, but I keep wondering how much of the spam out there is stuffed into the pipe with bogus From: and To: addresses both. Or in other words, purely useless noise, never to be delivered anywhere.

On a side note, with one or more of the spammer operations trying to sneak through using sender and recipient addresses in the target domain, I assume it is just a matter of time before I see a tuple with both sender and recipient addresses already in my spamtraps list. When that happens, I think I will feel inclined to let my friends have a round of refreshments on my tab.

It's obvious that there are a handful of spammer operations that have decided to use datadok.no (and to a lesser extent, dataped.no and ehtrib.org) From: addresses on the spam they send, apparently in an attempt to cover their tracks. I will probably never know why they decided to do that, but I wonder why they keep it up and for that matter how many other domains are seeing this, with bounces from strange places, directed at non-existent, fairly obviously generated bogus addresses.

So if you are seeing similar stupidity in your logs and if you are running a sensible greylister such as spamd, I would be interested in hearing from you so we can compare notes.

Out there in meatspace, EuroBSDCon 2007 is coming up. I'll be there with the PF tutorial on Wednesday. This Friday's deadline for an updated manuscript had totally slipped my mind (I blame the book and a few other, less rational, factors), but hopefully the 24 who signed up for the session will find it useful anyhow - there will be new bits and as much interesting stuff as I can manage. I'll be around for the rest of the conference too, but unfortunately I'll have to give the Legonland trip a miss.

Be seeing you in Copenhagen! The book is getting closer to finished, I promise!

Sunday, August 19, 2007

A Lady in Distress; or Then Again, Maybe Not

A two user domain gets bounces for seven hundred, grep and sed to the rescue, spamd saves the day

The past week moved along with only minor disturbances on the keep-systems-running front. The time consuming frustrations were generated elsewhere, and (un?)fortunately I am not at liberty to discuss the details. Incompetence was involved, next week it's somebody else's problem.

All the while, the spammer trapping experiment has been moving along at a leisurely pace.

Generally keeping the lists (both the web version and the live one) updated would cost me a few minutes' browsing of greylist dumps two or three times a day or whenever I felt like it, with a typical catch of maybe fifteen new bogus addresses to feed to the trap list each day.

For the last three or four days the haul has been smaller, with essentially no new captures yesterday, for example. Now I've found out why. They have moved on, alpabetically.

Done with bsdly.net, the dominant group of spammers moved on to generating addresses in the D domains including datadok.no and dataped.no. I'm bound to have missed a few, since the grand total by this morning had yet to reach a full thousand. By now, they seem to have reached the Es. This morning I noticed the overnight greylist dumps were bigger than usual.

The reason: ehtrib.org, the domain we set up mainly for my wife's use (read: her email), appears to be the current home of made up From: addresses, with roughly seven hundred accumulated by the time I was done with morning routines of breakfast with coffee and browsing the overnight incoming mail.

That is by far the largest addition to the flypaper list ever.

Fortunately, with only two active addresses in the domain (I'm not telling what either other one is) it's fairly trivial to extract the bogus ones.

Up to now I've been integrating the noise into the traplist page manually, for now I've put this batch up at http://www.bsdly.net/~peter/ehtrib-1stbatch.

They're all in the active traplist at the gateways, of course. It's the editing into the page the spammers will slurp via unattended robot I'm putting off for a little more while I'm doing some other writing. [not any more. all there now, but the original list is preserved too]

Just why this time we are seeing this number of addresses over a short period of time, and not a handful each day over several months is an open question. One likely explanation is that one of the chickenboners fell asleep at the wheel and let the junk generator run longer than they actually intended. Time will show if this means they move on more quickly.

When I have more time, I will probably analyse the data I am accumulating at the moment and tell the tales of the silly lamer tricks the spammers try to pull.

In the meantime, following up on earlier posts, there are still a few people who Just Don't Get It:
Aug 19 13:28:03 delilah spamd[3712]: 217.159.231.230: connected 
(9/9), lists: spamd-greytrap
Aug 19 13:31:49 delilah spamd[3712]: (BLACK) 217.159.231.230:
<> -> <armrest10@datadok.no>
Aug 19 13:33:32 delilah spamd[3712]: 217.159.231.230: Subject:
Considered UNSOLICITED BULK EMAIL, apparently from you
Aug 19 13:33:32 delilah spamd[3712]: 217.159.231.230: From:
"Content-filter at linux.byroomaailm.ee" 
<postmaster@linux.byroomaailm.ee>
Aug 19 13:33:32 delilah spamd[3712]: 217.159.231.230: To: 
<armrest10@datadok.no>
Aug 19 13:34:38 delilah spamd[3712]: 217.159.231.230: disconnected 
after 395 seconds. lists: spamd-greytrap

And it looks like the published list is having the effect I was hoping for. I keep seeing quite a few of the addresses in ALLCAPS (with numbers tacked on) I put on the web page a few weeks back beginning to appear in lowercase but otherwise identical in my greylist dumps.

In other news, the PF tutorial session at EuroBSDCon is now a definite.

See you in Copenhagen, if not before!

Now for that other bit of writing. The Book of PF page now refers to the tutorial page at bsdly.net. Now let's get that baby done.

The lady is, in fact, not too distressed.

And in case you were wondering - Yes, you can use my auto-generated list of trapped hosts for your own blacklisting purposes if you like. Here it's just a supplement to Bob Beck's traplist, and most likely you're better off using the Beck/UofA list along with your own greytrapping, but if you really want to use mine, be my guest. It gets updated ten past every hour.

Friday, August 10, 2007

BSD is dying, security shot to hell, clamav wins and other tales of depravity and greed

It's been an interesting week, in several ways.

Yesterday's big item was the slashdotted report that BSD is dying, or rather, that some important security related software in among others OpenBSD may, according to a paper by University of Cambridge researcher (and FreeBSD core member) Robert Watson, be vulnerable to a previously unresearched class of vulnerabilities. This time we're talking about a really hard problem which I think hits a lot more than the ones they picked for the tests. Local privilege escalation only, so not the third remote hole in OpenBSD after all. The paper is well worth reading, and if you're a little short of time, the slides will give you the general drift and then some. The sky didn't fall this time either.

Actually totally unrelated, Jason Dixon's BSD is dying talk (see above) is worth a few chuckles. He gave it at BSDCan 2007 too.

Meanwhile, reports say that over at LinuxWorld in San Francisco, they put a ten popular antivirus packages through the paces, and according to this story the free (as in GPL) ClamAV came out on top. Nice to see that the free stuff (which we've been using for years here) is found by independent testing to be as good as we thought it was.

Continuing the "the free stuff is quite good" thread, when I found that I actually needed a Windows machine to do some work from home, I tried getting that Windows laptop to talk to my wireless network at home. Windows didn't recognize the integrated 11b wireless adapter at all, so I dug out the Atheros based DWL-AG650 I'd used with the machine and various BSDs.

No go. Windows did register a new PCCARD inserted, but did not have a usable driver available. The Control Panel showed a generous helping of question marks, with two 'Ethernet Class' devices among them, so it's quite possible that the integrated 11b unit was the other one.

I'm not one who gives up easily, so I went to the D-Link web site for a driver. They did not actually have one on tap (or at least not easily available), since the card is no longer in production, so via the well known search engine starting with G I found something that claimed to be the correct Windows driver. Which installed, but even after a reboot the card management software (why oh why a separate management app for each bit of hardware in your system?) still claimed that no compatible card was present.

A short string of unprintables and 22 minutes later, I had the machine working the-thing-that-needed-windows via Rdesktop on Ubuntu, remote controlling a machine at the office. The moral of the story: If you need Windows, you're better off with Linux and Rdesktop.

Certainly worth a read is the short paper by Sun's Jon Bosak on why Sun voted not to OOXML, at http://www.streamingweb.no/v1-ooxml.pdf.
Well researched and well written, and contains such nuggets as
"On the face of it, this astonishing provision would appear to indicate that the authors of the DIS did not understand the purpose of XML,"

and
"In practice, the effect of radical underspecification is to allow behavioral details to be determined on an ad hoc basis by the dominant software."

This somehow fails to surprise me, it's the story of RTF all over again. I'be been meaning to write about Microsoft vs standards, but in the meantime Jon's paper is well worth reading.

Back to the inevitable spam update (yes, elzapp, I do sometimes blog about something besides spam), the local traplist keeps growing. I sometimes wonder if they've actually looked at what we do here - this morning's batch of fake From: addresses had proofreads49@datadok.no among them.

And accenting one of the points I made in the malware paper that we are making the spammers work ever harder to generally fail to deliver their crap, Bob Beck's traplist keeps growing and has now hit a new all-time high of 125,808 entries.

That number could grow a bit more before they're all done. I do pity those who get billed by unit of data transferred who still don't have a sensible setup in place.

And yes, the book is progressing.

UPDATE 2007-08-14: After a relatively quiet weekend spamwise (Bob Beck's list in the 65,000 to 85,000 range), activity seems to have reached another peak with a total of 141,892 entries trapped at 08:00 CEST this morning. I would have expected to see a corresponding surge in the number of new bogus addresses seen in our greylist, but they did not turn up. We can always hope that this is due to saner spam handling at sites which used to bounce spam back to the From: address.

Saturday, August 4, 2007

We see your every move, spammer

My logs tell me that the spamtrap topic is a favorite, and more likely than not somebody who read the announcement will also take a peek at the traplist itself. So while I'm slowly preparing a post about something else entirely (which what I feel is actually a lot more interesting), it can't hurt to fill you in on what I've been doing to keep track of spammer behavior.

It's a quiet life, at least by surface appearances. In between the steady stream of mainly confidential tasks handled at Datadok and the odd request to bsdly.net for services of one kind or the other, I focus on getting the book done, chapter by chapter.

The traplist is slowly expanding. The collection process itself is automated for all the tedious tasks. The "Unknown user" entries from my mail server logs as a source of traplist material almost dried up, so I started looking at the greylists directly.

After sampling my greylists at random intervals for a while, a short shell script now dumps the data to somewhere safe ten past every full hour, notes the number of grey entries and TRAPPED entries, and dumps the TRAPPED IP addresses to a file which is available to the world from the traplist page. The list is comfortably short at most times. I imagine somebody with beefier bandwidth or a more widely known domain would have more hosts trapped at any time.

The file with currently trapped hosts gets overwritten each time the script runs. There is an outside chance that the other generated data might be useful in future research, and storage is cheap these days, so I keep the data around.

Observing the greylists reveal some odd things, like a certain Taiwanese host which tried, on August 1st, 2007, to send roughly a thousand messages to one address in a domain elsewhere, using generated From: addresses at every host name and IP address in our local network. They probably thought they'd found an open relay. Spamd's "250 This is hurting you more than it is hurting me." probably did not register with them as an outright rejection, much like it fools a number of web available open relay detectors.

The conclusions still stand, though. They echo the conclusions from the malware paper (*): the spammers are working harder at sending their trash mainly because we are as close as does not matter to always correctly detecting and dealing with their junk traffic.

I keep wondering if even the few minutes' worth of work a day updating the traplist is worth it, since we are catching essentially all spam anyway. Then at intervals, one or more of the generated, made up addresses from the list actually turns up in my greylist dumps.

(*) Whenever the "The silent network" paper comes up in discussions, it looks like depending on who you are, it's either way too long or too short. At twenty-few pages it's too long for the attention span of the loudmouth self-appointed SMTP experts you may encounter on web forums and mailing lists, and too short (read: not a book) to carry much weight with a decision maker who will not read much more than the executive summary anyway. Making that article morph into a book is on my list of Things To Look Into Later If Time Allows And It Still Makes Sense Then.

If you're still there after reading all this: Click the ads already. Make somebody else pay for your entertainment.

Wednesday, August 1, 2007

On the business end of a blacklist. Oh the hilarity.

I had planned to write about something else for my next blog entry, but life came back and bit me with another spam related episode. Next time, I promise, I'll do something interesting.

In the meantime, I've discovered that a) very few people actually use SPF to reject mail b) the SPF syntax looks simple, but is hard to get right, and c) there are still blacklists which routinely block whole /24 nets.

This morning I got a message from somebody I met at BSDCan in May, asking me to do something LinkedIn-related. Naturally, since I felt I needed some more details to do what this person wanted, I sent a short email message. That message got rejected,

SMTP error from remote mail server after MAIL FROM: SIZE=2240:
host mailstore1.secureserver.net [64.202.166.11]:
554 refused mailfrom because of SPF policy

which means that the SPF record

datadok.no. IN TXT "v=spf1 ip4:194.54.103.54/26
ip4:194.54.107.16/29 -all"

does not do what you think it does. Mail sent from 194.54.103.66 was not let through.

OK, the checking tool at the OpenSPF site seem to agree with secureserver.net, and I seriously can not blame them for the choice to trust SPF absolutely.

At the moment it seems my listing each host name is what does the trick. Weird. Anyway, next up in my attempt to communicate with my overseas friend, I tried sending a message from bsdly.net instead. That bounced too, but for a slightly different reason:

SMTP error from remote mail server after RCPT TO::
host smtp.where.secureserver.net [64.202.166.12]:
553 Dynamic pool 194.54.107.142.

If you look up the data for bsdly.net, you will find that valid mail from there gets sent mainly from 194.54.107.19, which is in the tiny /29 our ISP set aside for my home net when I told them I wanted a fixed IP address.

I'm not sure if the rest of the "ip=194.54.107.*" network is actually a pool of dynamically allocated addresses these days, but I do know is that 194.54.107.16/29 has not been dynamically allocated for quite a number of years.

Going to the URL gave me this picture:



This really gives me no useful information at all. Except, of course, that at secureserver.net they think that putting entire /24 nets on their blacklist is useful. Some of us tend to disagree with that notion.

Anyway, I filled in the form with a terse but hopefully polite message, and clicked Submit.

I was rewarded with this message:



If I read this correctly, they think mail from 194.54.107.19 is spam because BGNett or MTULink have not set up reverse lookup for 194.54.107.142. OR because they think the entire /24 is dynamically allocated. OR somebody in that subnet may have sent spam at one time. I can only guess at the real reason, and repeat over and over that blocking entire subnets will give you a generous helping of false positives.

Nevermind that, the SPF record which made my mail from datadok.no go through to my overseas friend included a:hostname.domain.tld for all allowed senders.

And in other news, the PF tutorial saw its visitor number 15000 since EuroBSDCon 2006 on Saturday, last count is 15220.

Wednesday, July 25, 2007

Harvesting the noise while it's still fresh; SPF found potentially useful

In previous installments of my greylisting and greytrapping posts, I've described how I found that after publishing my traplist, I maintained the list mainly by searching the mail server logs for "Unknown user" messages and by running a tail -f of my spamd log in a terminal window.

It dawned on me a couple of days back that finding the "Unknown user" entries in the mail server logs means I find only the backscatter bounces that have managed to clear greylisting, sent by real mail servers which are misconfigured to deliver spam to their users. Clearing greylisting may take a while, but once the IP address enters the whitelist and the machine does not try to send again to any address which is already in the traplist, it will be able to deliver its spam or backscatter.

Harvest the noise while it's fresh Fortunately it's very easy to harvest the noise data while it's fresh. You search the greylist instead. A simple

$ sudo spamdb | grep GREY

gives you a list of all currently greylisted entries at that spamd instance, in a format which is well documented in the spamdb man page:

GREY|200.170.143.41|smtp6.netsite.com.br|<tbento@acipatos.org.br>| <peter@bsdly.net>|1185386752|1185401152|1185401152|1|0 GREY|217.19.208.25|idknet.com|<>|<credulity093@datadok.no>|1185386865 |1185401265|1185401265|1|0 GREY|85.249.128.205|neptune.usedns.com|<>|<credulity093@datadok.no>| 1185387329|1185401729|1185401729|1|0 GREY|194.183.162.193|scelto.relc.com|<>|<bequeathpi@datadok.no>|1185387398| 1185401798|1185401798|1|0

There will more likely be more than one, and in this format it's fairly easy to see at least two traplist candidates, credulity093@datadok.no and bequeathpi@datadok.no. I have no idea if 217.19.208.25, 85.249.128.205 or 194.183.162.193 would ever have cleared greylisting, but now that credulity093@datadok.no and bequeathpi@datadok.no are in my traplist (and yes, at least part of the process should be very easy to automate), they'll be stuttered at, starting with the next time they try to connect and most likely until they give up.

Now it's probably still useful to tail -f your spamd log anyway, but you can leave the harvesting off until you see a marked increase in simultaneous connections to spamd, as in when the first number in parentheses starts rising sharply. Here the number is low (the second number is the number of currently blacklisted hosts):

Jul 25 22:17:16 delilah spamd[11839]: 217.146.97.10: connected (12/12), lists: spamd-greytrap Jul 25 22:17:35 delilah spamd[11839]: 213.177.120.98: connected (13/13), lists: spamd-greytrap Jul 25 22:17:36 delilah spamd[11839]: 87.103.238.226: connected (14/14), lists: spamd-greytrap

When the first number rises sharply -- that's when the first wave of spam or backscatter hits, and you can harvest the noise while it's still fresh.

A good harvest means less work for your mail server.

SPF found potentially useful One recurring theme in greylisting discussions is how to deal with sites which do not play nicely with greylisting, specifically sites with many outgoing SMTP servers and no guarantee that the retries will come from the same IP (you can find a rather informal discussion in the PF tutorial, for example). If you can't get those sites to do the do the retry magic, you probably need to whitelist them, but in the case of large sites like google, how do you find out just which machines to white list?

For well run sites the answer is simple: if they publish SPF data, you use that. After all, that data is their own list of valid outgoing SMTP senders. The solution presented itself in a recent openbsd-misc post by Darrin Chandler. If you need to whitelist a site with many potential outgoing SMTP servers, the command is

$ host -ttxt example.com

That is, look up the text data in the domain's DNS data, which is where SPF data lives. The answer would typically be something like

example.com descriptive text "v=spf1 mx -all"

which essentially means, "for the example.com domain, only the mail exchangers are valid SMTP senders". The next step is easy: if the answer contained IP addresses or ip address ranges, you put those in your whitelist, in this case a

$ dig example.com mx

would get you the data you need (possibly after a few more host commands).

Frankly it would be a lot better if those sites learned to play well with greylisting, but if you choose to whitelist them anyway, at least this way you take their word for what their valid senders are.

Sunday, July 22, 2007

The noise, we ignore it

If anybody had expected my earlier posts about harvesting addresses to a traplist and publishing it and observing the results would lead to earth-shattering discoveries or headlines in major publications worldwide, I can tell you now: They did not.

This could mean that spam is now boring and ignorable, and in fact the data I've accumulated indicate that when it comes to spam and spammers, it all falls into a category of noise we are more than happy to just ignore.

In the semi-random samplings from the noise the spammers generate, there are some interesting observations. For one thing, one or more spammer operations have picked my handful of domains purported return addresses for their messages, and they have been doing this at least since some time in June, possibly longer. Judging from the addresses in the backscatter, there is likely two or three groups actively generating or making up addresses, with distinct methods.

One is to pick a domain and a word, feed it to the program which then generates a pair of addresses. Spammer picks the keyword flaunting and the robot spits out flauntingn6@datadok.no and 6NGNITNUALF0@datadok.no.

The other method is to just pick a word and stick the at sign and the domain on afterwards, such as between@datadok.no.

The third one, which appears in several variations, is to generate or make up what could at a stretch look like aliases based on people's first and last names, such as DrueNikonov@bsdly.net and lupu.kovjd@amidala.datadok.no (nevermind that amidala, the now-retired laptop never ran its own mail service) or just the first name such as Runar623@mail.datadok.no.

Another variation which had me a bit puzzled was what is probably designed to look like our domain is testing for mail deliverability, such as mail.matrix.farlep.net-1184227303-testing@datadok.no

And finally, of course there's the bottom feeders who try to use message-IDs and other junk extracted from news spools or the local Microsoft Outlook user's mailbox, any of the addresses with fsf@ in them and a few others clearly fall into that category, and y7jvlozt.fsf@thingy.datadok.no is a likely indicator that somebody, somewhere has news or mailing list mail which originated at my current laptop stored where spamware can find it.

Spammers have been working very hard lately. On July 17th, Bob Beck's traplist (which is generated by greytrapping at University of Alberta and which Bob makes available to anyone who wants it - see the PF tutorial for details), reached what I believe is the all time high, just a few short of a hundred thousand addresses. The actual number was 99941, at 20:00 CEST (that's 8PM in Imperial measure), it's been dropping off since then.

My more or less purely backscatter based lists during the same period grew to roughly 500 addresses in the local traplist, and the number of hosts actually trapped here as far as I can tell never went much over 400 at any time.

One interesting factlet that came up during the week is that Google, by the looks of it, is using SPF correctly. One BLUG member reported that one of my messages about the traplist to the BLUG mailing list had been tagged by Google Mail as possible spam. Exactly what triggered the classification was never revealed to me (he had already deleted the message when I asked him to take a look). But it made me go back and check the SPF records for our domains, and quite right, they were overly permissive. Editing them took a few moments, and the test messages sent only a couple of minutes later went through with headers indicating that they do check for SPF. Nice, GoogleMail! Next, might we have a chat about playing nice with greylisting?

The conclusions come rather naturally: Spam in general gets ignored or filtered correctly by the vast majority of Internet domains, the free tools are extremely effective, and even when the spammers go all-out in their efforts and possibly are even trying to actively inconvenience somebody, they're not getting much traction at all.

The main technical problem remaining is that vast number of unmaintained machines out there which bend over obediently to let the spamware install itself and run by remote. On the server side there are sites which still do not play nice with greylisting, but they will see sense eventually I hope.

We just ignore the noise, except for a few of us who see patterns in the noise and a few hopeless cases which it seems will do their traplist time indefinetely.

That chapter is a bitch to write, but getting there.

Thursday, July 19, 2007

Linux is easier than Windows, hands down

It all started a few weeks back when my wife and I decided to get her a new laptop. It was also conveniently close to her July 14th birthday and Dell had a huge campaign going in the newspapers. Dell had just started selling laptops with various Linuxes preinstalled, and Ubuntu was the environment of choice, so I felt reasonably sure that the machine would be usable even if the Linux preinstalled option is not available in Norway yet. After all, our daughter had essentially no trouble at all getting her laptop (a sleek little LG number) going with Ubuntu earlier this year.

So I ended up clicking my way on to the Dell web site and ordering an Inspiron 1520 with 1440x900 resolution and the Ruby Red option (actually it's just the lid, but it does look distinctive).

As luck and Dell's suppliers would have it, the color added some time to the delivery. The machine apparently shipped earlier than expected, and arrived here last Monday afternoon, most likely very close to when the plane with my wife and daughter touched down at Madeira.

I had already burned a 7.04 desktop iso to CD and booted from that. As with anything that comes with Microsoft preinstalled, you need to be really careful to hit the right key at the right moment unless you want to do battle with the Microsoft installer.

Eventually I succeeded, though, the thing booted, went into the traditional graphic Ubuntu startup and dropped me to a shell. Odd. Fortunately from my OpenBSD laptop I was able to find this ubuntuforums.org post which explains in some detail what needs to be done. In particular, the piece about how to fix the graphics driver is quite instructive.

Getting things done right when the graphics card and the wireless network card both need proprietary drivers loaded is a little puzzling when the system is supposed to be all graphic and your default network is wireless, but by following the instructions from the forum I made it eventually.

The short version is, in addition to the oddities involved in getting proprietary drivers the machine requires, the Feisty installer kernel for some reason does not have the proper support for the SATA controller in this Dell model.

Going back to the earlier version lets you do a clean upgrade, and to get all the bits working you need to have both the universe and the multiverse repositories in your package manager's configuration. Not hard for somebody who has not yet recycled all the grey cells with Debian etched into them, just a bit tedious, and even though all steps of the process was accompanied by sensible messages from the system, our only live Ethernet is in the attic where the servers live.

Anyway by elevenish in the evening I had the system all installed with native resolution and 32 bit color depth, on the wireless network and Just Working. The process took significantly less time than putting Windows from restore CDs back on the system it came with.

Yet another evening spent on other things than I had intended (such as writing those crucial bits of the book), but seriously folks, you must never miss an opportunity to make your wife happy.

With that out of the way, I can state with even more confidence:

Linux is easier and more user friendly than Windows.

By now I consider that to be documented beyond question.

But for the things I care about, I really prefer OpenBSD or FreeBSD.

Updates: The spamtrap is growing by a few new addresses a day. I sometimes spot them flying by in the xterm where my spamd log is tail -f'd, sometimes I grep them out of the mail server's logs. Some patterns are emerging, but more later when I have more data.

Friday, July 13, 2007

Spam is a solved problem

Executive summary: Spam is a solved problem, email works again. There are a few knuckle draggers out there who haven't noticed yet, but we'll get around to dealing with those shortly.

I've been looking over my log summaries again. My regular logs get rotated out of existence after seven days, but from the summaries I do keep around, it looks like various made up @datadok.no addresses have been used as spammers' fake From: addresses for about a month. I was too busy with other Very! Urgent! Things! to notice at first, but it finally dawned on me when I searched my mail server logs for "Unknown" as in "Unknown user" and saw from the results that somebody, somewhere, was using that domain for generating sender addresses.

After about two weeks of observation and collecting made up or generated addresses for my traplist, my conclusion is what the title of this post says. Spam is not a problem anymore. I know, of course, that "how to cope with email and spam" self help guides are best sellers, and a recent Salon.com piece even went so far as saying about email,
Now, it seems, we're drowning. There's simply too much e-mail. The tide of spam buries valuable messages.

That kind of surprises me, because it's not what we're seeing here at all. Of course we know that there's a lot of junk being sent, but ask any of the people on the sites I run on any given day how much spam they've received recently and they have to look up the date of the last one in their "Junk mail" folders.

I do see some spam myself, mainly because I still fetch and read mail I receive at an ISP address I've used a lot for USENET and mailing lists over the years. And since unfortunately no method ever has a zero error rate, occasionally a spam message or two trickles through that shouldn't have on the systems I run myself. But if the tide of spam buries valuable messages, you haven't kept up with the technology, plain and simple.

By and large, from the perspective of somebody who has been the purported sender of an unknown portion of the tide which drowns out the Salon.com writer's messages, it looks like spam is treated correctly or at least in ways that do not annoy others unnecessarily at most sites. (In all fairness that piece is more about email versus other types of writing than technical matters, and certainly worth reading for that reason. The same writer has written a number of other articles which are worth your time too.)

At the last count, our main spamd running gateway had all of 316 addresses in the local spamd-greytrap table, meaning that only that many hosts have actually tried to send mail to one or more of the addresses listed at our spamtrap page during the last 24 hours. Some of the trapped machines would have been active spam senders, and most of the rest seem to have been sites which were configured to receive spam and bounce back to the From: address when the spam was not deliverable.

That is an important point to note. If your system sends a 'message undeliverable' bounce message for spam sent to a non-existent user, it is configured to deliver spam to the users you do have, and there are certainly ways to avoid that. I've decided not to plug any of my other writing directly in this post, but you should be able to find the references easily enough if you're interested.

Reading the spamd logs is sometimes quite entertaining if you're that kind of guy or girl. Here is one example of a site with clearly deficient spam and/or malware filtering, possibly their own homebrew:

Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: From:
postmaster@trendmicro.com
Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: To:
jonson3846@datadok.no
Jul 14 09:13:28 delilah spamd[29851]: 66.35.252.70: Subject:
Delivery Status Notification (Failure)

It does not matter much to us, but they'll be unable to get mail through to us for the next 24 hours.

The next one is clearly problematic, since whoever set up the system appears to have left back in the time when there still was a chance that spammers used real addresses. Or maybe the poor wretch stayed on and now suffers from delusions, incompentence or both:

Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228:
Subject: Considered UNSOLICITED BULK EMAIL, apparently from you
Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228:
From: "Content-filter at srv77.kit.kz" <postmaster@srv77.kit.kz>
Jul 13 14:36:50 delilah spamd[29851]: 212.154.213.228: To:
<skulkedq58@datadok.no>

Would SPF have helped? Possibly. We have our records set up, but clearly these guys are not using it in any meaningful way, and after -- what is it -- five years it's still not clear which of the competing RFCs with varying degrees of proprietary content is going to come out on top.

Staying out of our traplist would have saved some resources on their side. On our side, well, we have a working system. Email works. Mail from us has to go through our mail server. Incoming mail needs to clear spamd's greylisting (and really needs to come from IP addresses which are not in any of the blacklists we use) and pass content filtering inspection by spamassassin and clamav, all of it conveniently within reach on any freely available BSD system. The content filtering packages are available on your favorite Linux as well, but on our sites, we use OpenBSD and FreeBSD.

Living spam free and unworried by malware is possible. If you make a few right choices it's actually easy, it doesn't cost much and just imagine how much of your time you stop wasting.

Monday, July 9, 2007

Hey, spammer! Here's a list for you!

Last week I started noticing from my log summaries that my mail servers had seen a lot more mail to non-existent users than usual. This usually happens when somebody has picked one of our domains as the home of their made-up return addresses for their spam run. This time, from the looks of it, the spam runs were mainly targeted at Russian and Ukrainian users. At least that's where most of the backscatter appears to have come from.

As I've written before in the PF tutorial and the malware paper (updated version available as this blog post -- that's the end of today's plugging, I promise), I've used the "Unknown user" messages as a valuable data source for my spamtrap list, just quitely adding addresses that looked really unlikely to ever become valid. After a brief airing on the OpenBSD-misc mailing list and running it by my colleagues at Datadok and Dataped, I've decided to take it a bit further.

Now that I've got a list of addresses which will never receive any legitimate mail, I really want spammers to try to send mail to those addresses. After all, if they send anything to an address which consists of a random string with one of our domains stuck on after the '@', we know it's all spam from there on.

We don't care about the rest, for the next 24 hours. Your SMTP dialogue with us (actually our spamd) will be all a-stutter, receiving answers one byte at the time until you give up. For the record, that usually takes about 400 seconds, with the really imbecile ones taking a lot longer. See the paper or the tutorial for some numbers.

The other possibility is of course that your system is set up in a way which makes it actually receive and try to deliver spam. Some of the spam will be addressed to non-existent users in your domain, so if your users receive spam, you will be trying to send bounce messages back to the purported sender for spam to non-existent users. That's tough, kid. If you're set up that way, your machine will be treated to the tarpit here for the next 24 hours. All a-stutter and all that. Repeat offenders stay there longer.

Now for the spamtrap list, I've checked that my colleagues and associates have never actually wanted to use those addresses for anything, and I made this page which wraps it all in a bit of explanation. For some reason, the list keeps growing each time I look at my log summaries.

When I get around to it and find a visually not-too-horrible way to do it, I'll include links to that page where they fit naturally on our web sites. In the meantime, here's hoping that the spammers' address harvesting robots find this list and put it to good use.

The chapter, it's improving. More later.

UPDATE 12-jul-2007: The softer side of me ponders the possibility of sending email form letters to the various postmaster@s with the URL to this blog post. On the other hand, I'm not sure I'm ready for another round of finding out that postmaster@ is in fact not deliverable at a surprising number of sites around the world.

One other thing I've noticed since I published the traplist is that bounces to addresses like mixt.apex.dp.ua-1184227575-testing@datadok.no have started appearing in the logs. I don't see how messages like these could be useful by themselves, but the addresses are of course obvious traplist material.

13-jul-2007: Oddly enough, there's still a stream of backscatter, and my logs tell me a few new addresses turn up every day. This morning's fresh ones were careersogt2083@datadok.no, phalanxesxb88@datadok.no and retryingvtt@datadok.no. Another few bytes to help weed out the bad ones early, thanks to the robots out there.

Tuesday, June 26, 2007

China has a Norwegian speaking techie population

And from the looks of it, they like OpenBSD or at least PF. I can see it in my logs.

As regular readers will know (hi, all three of you), while I was attending EuroBSDCon 2006, I moved the PF tutorial's home to NUUG's server. As a consequence, I now have read access to the web server logs related to the material I have put there.

That's how I, thanks to a tiny little statistics script, know that at this moment 13022 unique IP addresses or host names have hit one or more files in the tutorial directory tree since I moved the files to this location.

It won't surprise you that I sometimes when I really should be doing other things, glance at the logs and occasionally see something interesting.

At times I see stuff like somebody at 222.76.215.122 fetching http://home.nuug.no/~peter/pf/no/langbrannmur.html - the Norwegian version of my PF tutorial (which is sadly behind times, unfortunately), as one long html file.

If you do a 'whois 222.76.215.122' you will see what I mean. Somebody in China is reading about how to set up PF, in Norwegian. The funny thing is that something like this happens fairly regularly.

If we can trust the whois data to be correct, this could mean several things:
  1. Educated Chinese prefer to read networking literature in Norwegian over English, even when the English version is more up to date
  2. There could be a number of Norwegian network people in China who feel better after reading about PF in their native language
    -- or --
  3. China is so big and has so many people who are potentially interested in PF that at least a few times a week one will fetch the Norwegian version of my tutorial by accident.
I just thought I'd share it with you guys. And now it's 13026.

[and yes, there is option 4: Chinese robots, slurping away anything they can find. But the first theory is a lot more fun.]

Which Windows XP version is it on that laptop?

Q: Which Windows XP version is it on that laptop?

A: None. I run OpenBSD.

That exchange happened in my office a few days back. The reason? At my day job, we do a number of different things, and at times we do tests on new hardware for customers who require that. So a customer asked us to do an assigment which involved going to an airport in a different city and test some equipment there. And they asked if we would have a laptop with English-language Windows XP on it to bring along. That's when I answered that actually my laptop does not run Microsoft Windows at all. I added that if I could dig out the restore disks for my old one, it would probably be Norwegian.

Frankly I'm not sure what my response my colleague sent back to the customer, but it probably said something along the lines that we may be able to dig out one with Norwegian XP on it. It took me only a few minutes to dig out the restore CDs (for some reason my old Fujitsu-Siemens laptop (an Amilo 1840W if it matters) came with two apparently identical restore disks), and put them on my desk. Then of course I forgot to bring them with me (that machine was in my attic at home), but finally I brought the machine into the office this morning.

Why bring out a 2004-vintage machine in the first place? Well, the ThinkPad R60 I'm typing this on did come with some sort of Microsoft system on it, halfway installed with no backup media, and since this is the machine I rely on for stuff I need to do every day, it has been running recent OpenBSD-current snapshots since I got it last October.

So for a detour into the world of mobile Microsoft computing, I needed to get the older unit up and running again. It had worked reasonably well up until I bought the newer machine - with a 3.2GHz Pentium IV and three fans in it it sounded like a hangarful of F16s and it never did have more than about 72 minutes of battery life (since reduced to zero), and in the end small bits of plastic started coming off it here and there, but it still worked and it did come with restore media.

So arriving in the office this morning with a sligtly heavier backpack than usual, I plugged the machine in along with the others, turned it on and after a bit of fiddling with firmware menus got it to boot from the restore CD. (That is, to actually get the thing to boot the CD, you need to watch for the right moment to press Enter, I almost forgot.) The installer takes a while to load, and if we forget about the 'you have to agree to this EULA' screen it's on par with FreeBSD's sysinstall for intuitiveness or lack of it.

The first thing that really amazed me was how long it took to create a file system on the disk. The last system on the machine pre-windows restore was OpenBSD, so of course creating a new partition and file system was necessary. Past lunch the thing was still only in the early stages of copying system files.

Then for a while it just stood there with a dark screen making CD crunching noises, so I power cycled the machine. It came up again with a the traditional teletubby background and a dialog which demanded to see the restore disk for a while more. I gave it that, and it went on for a while, finally rebooting. Over the next few reboots the system consistently tried (but failed) to find the correct resolution for the internal display (1200x800), trying 800x600 alternately and 1024x768, never finding the physical resolution at all.

Any X I've encountered just magically found the optimal configuration for this kit; for the first few months I had the machine it ran fine without an xorg.conf. After I started taking it to lecture halls to speak to projectors it turned out I needed one to fiddle with, but finding something that worked on the interal screen was never ever a problem with any freenix and X. But then of course I guess Windows had to be different. By the time I had XP's Service Pack 2 installed, it was 3.30 pm.

What would you do - go on mucking about with the Windows machine and try to make it behave, or tell the customer "if you need me to go there with something that speaks wifi and I can use to take notes, I can bring my OpenBSD Thinkpad"? It runs X with KDE, so I won't look too scary I guess.

Bah. It's late in the day, SWMBO just called in for supplies, and I have other writing to do. In the meantime, I could see how far installing that FreeBSD-current snapshot on it gets.

The book, it's still progressing.