Saturday, September 29, 2007

Always a pleasure to be wasting your time, guv

This week has been a little unusual around the BSDly household. So far I've generally been doing my regular job in the daytime (with longish office hours), only working on the book evenings and weekends. That the arrangement would lead to "Exhaustion is my middle name" status was obvious to everyone except me, but I finally saw where it could be going. So for a little more than the past week I've been working on the book full time.

The state of perpetual exhaustion has had some not too happy consequences. Of course the general progress on the book suffered, but it also lead to me missing the monthly BLUG meeting in August. Of course much of that particular day I had spent persuading somebody not too bright that it indeed had to be a reconfiguration they said had never happend at their end which ended up breaking things at our end, and I was just too tired and missed what I assume was a well executed lecture on networking basics by Vegard Engen (of RFC1149 implementation fame).

This week with only one job I needed to tackle, I was there for an enjoyable one and a half hours of Bacula, well presented by Bård Aase (aka elzapp). Off to Henrik (the regular BLUG pub) for a few beers afterwards, and with Johan Riise volunteering to put together a 'Unix and time' lecture for next month, the BLUG calender seems to be in order after all, with Jill Walker doing the end of semester talk in November, on whatever interesting stuff she has been up to lately. Unfortunately it looks like the last Thursday of November is close enough to OpenCON that I'll likely miss Jill's session.

In the meantime, there are signs that the greytrapping and my bait list is working. Looking over the spamd logs today I found quite a few entries like these:

Sep 29 15:29:23 skapet spamd[20795]: (BLACK) 84.76.177.159: 
<royaleuromillion2007@yahoo.es> -> <211hgsreliart7@datadok.no>
Sep 29 15:29:32 skapet spamd[20795]: (BLACK) 84.76.177.159: 
<royaleuromillion2007@yahoo.es> -> <00b27f18@datadok.no>

which looks strikingly like the Spanish lottery scam spammers patiently and methodically working their way through my list of bait addresses, all the way from top to bottom, at roughly 3000 addresses it's going be a while. All I can say is, we are extremely pleased to be wasting your time, senor.

Also while the girls were off to the Raptus comics festival (an annual event, and one of the big things here in Bergen), I found enough trash backscatter to non-existent bsdly.net addresses that it's likely that the same weekend spambot operators who spewed their spam with @ehtrib.org and @skapet.datadok.no addresses earlier (both times at weekends) have now discovered bsdly.net and are doing their damnedest.

Why they prefer to generate a few hundred fake addresses and use them all in one go is beyond me. The other groups seem to generate only a handful of new addresses each every day, and for good measure at least one of them sort of reuse the generated addresses by using a forward and a reverse (such as in this morning's preserved greylist dumps, there was a potterv76@datadok.no as well as the reverse 67VRETTOP3@datadok.no). This lot just dumps all they have in one go, mainly contributing to swelling that file in my home directory with the totally unprintable file name which is the temporary storage before they go to into the traplist and on to the bait page.

Distractions of that kind from my main task is never entirely welcome, but with a larger influx of new addresses to be added to the bait list I made some small changes to make the maintenance of that page a bit more sane, rediscovering server-side includes and redirects along the
way. And the data I keep collecting may become the basis for other projects later.

Anyway, it is increasingly clear that the spammers are including the generated fake addresses in their "known good" lists. Consider the spambot at 210.111.190.216 (apparently in Korea), which insists on delivering to an address somebody generated in early July:

peter@skapet:~/www_sider$ grep  210.111.190.216 /var/log/spamd
Sep 29 15:58:07 skapet spamd[20795]: 210.111.190.216: 
connected (5/4)
Sep 29 15:58:21 skapet spamd[20795]: (GREY) 210.111.190.216: 
<jim.vance@presentsmadeeasy.com> -> 
<careersogt2083@datadok.no>
Sep 29 15:58:22 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:58:35 skapet spamd[20795]: 210.111.190.216: 
onnected (4/3)
Sep 29 15:58:49 skapet spamd[20795]: (GREY) 210.111.190.216: 
<tbaker@groupecdb.com> -> 
lt;careersogt2083@datadok.no>
Sep 29 15:58:50 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:59:03 skapet spamd[20795]: 210.111.190.216: 
connected (5/3)
Sep 29 15:59:17 skapet spamd[20795]: (GREY) 210.111.190.216: 
<wotan@4vsi.com> -> <careersogt2083@datadok.no>
Sep 29 15:59:18 skapet spamd[20795]: 210.111.190.216: 
disconnected after 15 seconds.
Sep 29 15:59:30 skapet spamd[20795]: 210.111.190.216: 
connected (6/5), lists: spamd-greytrap
Sep 29 16:03:14 skapet spamd[20795]: (BLACK) 210.111.190.216: 
<sylviacastleman@alltypecalligraphy.com> -> 
<careersogt2083@datadok.no>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
From: "Marguerite Casey" <sylviacastleman@alltypecalligraphy.com>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
To: <careersogt2083@datadok.no>
Sep 29 16:04:59 skapet spamd[20795]: 210.111.190.216: 
Subject: 100mg x 60 pills US $ 129.95 buy now
Sep 29 16:06:04 skapet spamd[20795]: 210.111.190.216: 
disconnected after 394 seconds. lists: spamd-greytrap

I have no real opinion on the validity of the From: addresses, but the address they are trying their best to deliver spam to here never actually existed, of course. The first record of it at datadok.no was this bounce from a Russian site:

Jul 12 23:38:52 delilah spamd[29851]: (GREY) 81.177.34.190: 
<> -> <careersogt2083@datadok.no>

Dumping their trash back at them is good for a laugh, and I am quite amazed how shortsighted the spambot operators appear to be. They get yelled at for spamming, so to avoid detection, they start using fake addresses. This in turn means they have no feedback whatsoever on the quality of their address lists, and with well pissers like me in action, they are getting less effectitive each day, reducing themselves to background noise in the network.

Now with this blog post done I will go back and finish the edits on the logs chapter. With the early parts of the book about to enter the layout phase while the last bits get written over the next few days, there is a chance that there will be a physical copies of the book to pass around at OpenCON. Not quite there yet, but the fulltime push is certainly helping. The preface with a list of thanks is part of what is entering layout; I think a few people who did not expect to be in there will soon have a pleasant surprise.

Also this week, the PF tutorial saw its unique visitor number 19,000 since EuroBSDCon 2006 on Thursday morning (September 27th). We certainly hope at least some of them will come back for the book.

Friday, September 21, 2007

The Great SCO Swindle Winding Down, But Will They All Get Away With It?

Poor Dan Lyons. He thought like a bookmaker and wrote what he thought was right.
You see, a few years back, when Caldera was still Caldera, that company had successfully sued a large corporation and won. Then Caldera changed its name to SCO and sued another huge corporation. Dan the bookie thought it was a sure bet, and started cheering them on. Four years on, the sure bet went south on a technicality. They did not actually own the code they had accused others of stealing. At least that's the way I read his Snowed by SCO article over at Forbes.

My take on this is, Dan, you only had to look at the facts. Knowing a bit of IT history is also a plus. When the SCOX matter came up, I like most people thought that you can never rule out the possibility that some code might have been copied. After all, Unix source code was never particularly hard to get you hands on and was widely used as classroom examples all over the world.

Then if that code was just identified, it would be ripped out and replaced. It's happened before. In the free software world, whole subsystems get replaced when there's a good reason to, and if the reason is copyright violation it gets somewhat urgent. The problem is, in the SCO matter, no code was ever identified.

Some journalists went through an elaborate procedure involving non-disclosure agreements and were, we are told, showed code from Linux and somewhere else which showed remarkable similarity. When Darl McBride used the SCOForum 2003 conference to show something he passed off as ripped off code, it took only hours to identify the exact chunks through the obfuscation (yep, formatting comments in the Symbol font) and the code proved to be irrelevant.

None of these events helped convince techies of their claims, but for me the tipping point was when they claimed to have a reason to sue the BSDs as well. Anyone who had been paying any attention at all to Unix history knew that the ATT vs BSD lawsuit was finally settled in 1994, with most of the terms sealed, but one of the few things that was made public was that the parties had forfeited any right to sue each other over the Unix code base. To me and quite a few others, this was proof positive that they were 'misguided or dishonest', as a commentator put it at the time.

One of my favorite summaries of the facts of the case was written by Greg Lehey (of The Complete FreeBSD fame), who looked at the various announcements from the technical side. He stopped maintaining it after a while, but it's still there at his website, with as far as I tested with all links intact.

Most people seem to be relieved that the matter seems to be over. I beg to differ.

For one thing, the main characteristic of this matter has been the amazing ability of the SCO crowd to drag out the proceedings over irrelevant, mainly procedural matters. They will have more tricks up their sleeves, for certain.

The other thing is, with Dan's friends out on the technicality that they did in fact not have the legal standing to sue, we will never get that detailed walkthrough of the code where Darl and his covert experts are supposed to point out the infringing code. I, for one would have looked forward to that. Then we would have had a chance of getting to know their real motivation too, and possibly some solid leads on the planning and funding. Now that will just not happen.

Then of course there's the stockholder lawsuits and possibly the FTC. If you were one of those chumps who bought SCOX stock at roughly twenty dollars a share based on Dan Lyons' recommendations, wouldn't you feel a little sore now that your investment is about a cent to your original dollar? That is, if you can unload it before SCOX are finally kicked out of NASDAQ for good?

So poor Dan Lyons for not seeing this coming. And damn the technicalities for cancelling the main event.

For those of you eager for news of the book, we're working hard to get it out there.

Update 2007-09-25: Another non-apology, this one from Rob Enderle.

According to linuxtoday, Rob Enderle claims he was tricked by (wait for it) both SCOX and those ever-bullying Linux people.

Actually, there's not much to see there, You can read it as just another non-apologizing apology, with some tall tales about death threats and DOS attacks thrown in (yes, really).

As I've said a few times earlier, enough facts were on the table right from the start of this timewasting story to show that more than likely the SCOX crowd did in fact not have a case.

Now I wonder what, if anything we will be hearing from
John Parkinson, who wrote in CIO:

"a lot of the intellectual property in Linux is actually owned by companies that never officially agreed to make it available under an open-source license."

Interestingly enough, that came without any qualification at all.

That irritated me enough at the time that I wrote to them (pasted into some inane feedback form):

Alleged intellectual property theft

In the article called "The End of Idealism" (http://www.cio.com/archive/070103/et_pundit.html), John Parkinson writes, "a lot of the intellectual property in Linux is actually owned by companies that never officially agreed to make it available under an open-source license."

Please take a moment to consider the seriousness of this allegation. What Parkinson actually says here is, "large parts of Linux consist of stolen property".

Reading such allegations in an article written by a senior executive of Cap Gemini Ernst & Young is quite shocking in itself.

It is only reasonable that Mr Parkinson or Cap Gemini Ernst & Young specify which parts of the Linux kernel they consider to consist of stolen property.

All versions of the Linux kernel, along with detailed change logs and archives of the developer mailing lists are available to the public. Using these resources, all parts of the code base can be traced to the individual who submitted them for inclusion.

In other words, it is quite easy to pinpoint who did what, and Mr Parkinson and Cap Gemini Ernst & Young would be doing the public a great disservice by refusing to help point out code which was illegally included in the open source operating system.

Quite a few articles, well informed and otherwise, have been written about the SCO vs IBM lawsuit and SCO's allegations. I suggest interested readers browse FreeBSD Core member Greg Lehey's overview at http://www.lemis.com/grog/SCO/index.html while we wait for more details from Mr Parkinson or Cap Gemini Ernst & Young.





Slashdot Slashdot It!

Monday, September 17, 2007

EuroBSDCon was great, disks dying and some scary Windows stuff

This Monday finds me safely back from EuroBSDCon and trying to do useful things while the file server gets restored.

Of course it had to be that way. With me off to EuroBSDCon to do the tutorial and other refreshing geekiness, in the first batch of mail I retrieved after arriving in Copenhagen was a log summary from the machine which holds pretty much everything datadok is working on at any time, with these nuggets:

> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410884031
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410912703
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410884575
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=410905887
> > ad6: FAILURE - READ_DMA status=51 error=40 LBA=410857151
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=446104667
> > ad6: TIMEOUT - READ_DMA retrying (1 retry left) LBA=446104667
> > ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=522840603

ouch.

This is what disks say when they've run out of space to map bad sectors into. The disk wasn't quite dead yet, but definitely time to plan a replacement. Not much to be done about that right away except alert the colleagues that there would be file server downtime on the Monday afternoon. Disks will die, and sysadmins end up with the task of replacing them.

My brief summary of EuroBSDCon is that it was an excellent conference, lots of good talks, interesting people to see and in a good, clean location with a network connectivity which worked, most of the time. update: finally my eurobsdcon pictures are on flickr

For my own part the PF tutorial went reasonably well, with 24 people signed up and I think one or two sit-ins. People were paying attention and there were a few good questions which made the session more interesting with a little more improv than the last few times I did this tutorial. Answers were had, though, and I believe a good time with useful info for the people who had signed up for the session. Not too many hours after we were done, the number of unique visitors (aka host names or addresses) to the tutorial tree since last EuroBSDCon rolled past 18,000.

After lunch Marco Zec's session about virtualizing the FreeBSD network stack was really interesting. Unfortunately none of the Thinkpads present were able to boot from the FreeBSD-current image Marco had prepared and supplied on USB thumbdrives, actually producing pretty much the same crash (illustrated here). But a very interesting topic and session. I'm glad I stuck around for it.

The Wednesday I had the choice of sightseeing, sitting in on Kirk's session and holing up in the hostel basement's hacker room to get some writing done, and I ended up going for the latter option, getting significant parts of the logging chapter done. There is of course a limit to how long you will avoid interruption in a semi-public area, but that session was certainly useful.

The EuroBSDCon hacker area with both wired and wireless networks was available to conference attendees all conference and tutorial days. Naturally it took on a social function in addition to being a convenient way to surf and fetch your email.

For the conference itself, it was sometimes hard to choose which talks to go to. I still think Ike's jails talk (pix here, here, here) was my favorite (similar but not identical to the one he gave at AsiaBSDCon in Tokyo), but there were a lot of good ones. I ended up managing to miss Pierre-Yves Ritschard's Load Balancing talk since they'd switched the schedule around. I hope there's a chance to pick up the essentials at some later date.

Fortunately Wim and Machtelt turned up to organize the OpenBSD booth (convenient for restocking your clothes cupboard) and some news about OpenCON - there will be an OpenCon 2007, but there's still some organizing to do. I hope to be seeing you there, Venice November 30th through December 2nd.

From the Windows Is Scary department, one episode from a few weeks back which I suddenly remembered when I realized the guy quietly hacking to the left of me was FreeBSD USB guru Hans Petter Selasky:

When I saw 4GB USB thumb drives priced at just under NOK 300 (USD 55), I decided I needed one. The drive mounted with no trouble at all in in OpenBSD (mount /dev/sd1i at the location of your choice), and I thought good, I'll just delete those .exe files to make room. A few days later I needed to retrieve som files which turned out were most easily accessible from my Windows machine at work. So I plugged in the new 4GB thumb drive.

Windows machines always do strange things and take a while to recognize new hardware, but this time it claimed to have found a new CD drive. A few confusing minutes later, with various message boxes flashing across the screen, the machine begged for a reboot. I let it have that, slightly puzzled but not entirely surprised that Windows wanted the user to jump through a few extra hoops to make something work.

I was able to retrieve the files eventually, while trying to avoid yet another quirky Windows application which wanted to handle my files. As it turns out, the device actually emulates a CD drive as well as USB mass storage. Here's what it looks like in /var/log/messages on my OpenBSD laptop:

Sep 17 22:23:23 thingy /bsd: umass0: SanDisk Corporation U3
Cruzer Micro, rev 2.00/0.10, addr 2
Sep 17 22:23:23 thingy /bsd: umass0: using SCSI over Bulk-Only
Sep 17 22:23:23 thingy /bsd: scsibus2 at umass0: 2 targets
Sep 17 22:23:23 thingy /bsd: sd1 at scsibus2 targ 1 lun 0:
SCSI2 0/direct removable
Sep 17 22:23:23 thingy /bsd: sd1: 3913MB, 498 cyl, 255 head,
63 sec, 512 bytes/sec, 8015502 sec total
Sep 17 22:23:23 thingy /bsd: cd1 at scsibus2 targ 1 lun 1:
SCSI2 5/cdrom removable

The reason all the strange and scary things happened with the Windows machine is that the emulated CD contains Windows Autorun files, which it seems there is no easy way to turn off or is at least enabled by default in that operating system. What I find slightly disturbing is that, as Hans Petter explained, this behavior is part of the device's firmware and you can't get rid of that five or six megabytes of useless software in these devices. The best you can do is use a system which ignores such silliness.

Returning to the file server, the box is a few years old and has by now probably had most of the original components replaced. The last time we replaced the motherboard, we were still thinking that SCSI was the only way to go for storage, disks and tape both. Not too long after that, we decided that actually SATA was OK for that little office of ours, but when the time came to replace that disk, I discovered that actually the motherboard had only two SATA ports on it, one for the system disk and one for the dying data disk. So copying across from one SATA disk to another had to be done via Ethernet instead. Fortunately installing a useful operating system takes only about twenty minutes, and the some tens of gigabytes transferred while I was writing this article. Far faster than restoring the same data via rsync from our offline backup, though.

Among the things announced in Copenhagen were that there will be an AsiaBSDCon in March 2008, NYCBSDCon will maybe be next year in the fall, and the next EuroBSDCon will be in Strasbourg. I hope to be at several of those, time and money allowing. But now on to finish that book.

Saturday, September 8, 2007

Wanna help science? Study your greylists' innards!

If somebody, say five years ago, had told me that I would be spending a little time, every day, studying data about what invalid addresses some unknown miscreants are making up in my domains, I would have thought them to be slighly off their rockers.

Yet here I am, actually maintaining a publicly available list of addresses which do not stand a chance of becoming valid, ever. It all started with a log data anomaly - I noticed an increase in the number of failed delivery messages to non-existent addresses in our domains. I had expected that the bounces to invalid addresses would appear for a short period only, but for one reason or the other it looks like it's here to stay, with some dips and peaks like the ehtrib.org flood.

The list is apparently working as intended too. These addresses are on my local greytrap list, and I have started seeing addresses I put in there as all uppercase turn up in my logs in all lowercase variants. Fun to watch, sort of.

Anyway, the supply of new bogus addresses proved to be larger than I had expected. So to get a handle on just what is happening I ended up doing periodic dumps of the live greylist data. This is really easy to do if you're using spamd as your greylister, your basic command is

$ sudo spamdb | grep GREY

and you redirect to a file, pipe to mail, or whatever you like.

Now if you're a bit like me, looking for patterns in the noise like this makes you feel a little weirder than usual and possibly lead you to think of a Clive Barker novel (specifically the bits about the dead letter file in The Great and Secret Show) and you wonder why this is worth doing at all. After all there is precious little spam that actuall reaches my users, so like I said earlier, for us spamd users it really looks like spam is a solved problem. I guess I'm just a bit fascinated by the pure irrationality of the spammers' behavior.

From the data I collect here in my tiny corner of the world to browse when time allows there may be useful information lurking somewhere.

Typical entries show things like the host 202.152.33.43 tried to send with a From: address jcejft@charter.com to dkqvujfn@datadok.no and sdenuuu@datadok.no. Using a few common networking commands we see that there is no reason why charter.com email should come from the IP range belonging to idola.net.id, and as the admin of datadok.no I know these two addresses have never been deliverable. Most likely the admin at charter.com can tell you if that from address is deliverable, but I keep wondering how much of the spam out there is stuffed into the pipe with bogus From: and To: addresses both. Or in other words, purely useless noise, never to be delivered anywhere.

On a side note, with one or more of the spammer operations trying to sneak through using sender and recipient addresses in the target domain, I assume it is just a matter of time before I see a tuple with both sender and recipient addresses already in my spamtraps list. When that happens, I think I will feel inclined to let my friends have a round of refreshments on my tab.

It's obvious that there are a handful of spammer operations that have decided to use datadok.no (and to a lesser extent, dataped.no and ehtrib.org) From: addresses on the spam they send, apparently in an attempt to cover their tracks. I will probably never know why they decided to do that, but I wonder why they keep it up and for that matter how many other domains are seeing this, with bounces from strange places, directed at non-existent, fairly obviously generated bogus addresses.

So if you are seeing similar stupidity in your logs and if you are running a sensible greylister such as spamd, I would be interested in hearing from you so we can compare notes.

Out there in meatspace, EuroBSDCon 2007 is coming up. I'll be there with the PF tutorial on Wednesday. This Friday's deadline for an updated manuscript had totally slipped my mind (I blame the book and a few other, less rational, factors), but hopefully the 24 who signed up for the session will find it useful anyhow - there will be new bits and as much interesting stuff as I can manage. I'll be around for the rest of the conference too, but unfortunately I'll have to give the Legonland trip a miss.

Be seeing you in Copenhagen! The book is getting closer to finished, I promise!