That grumpy BSD guy

Monday, September 22, 2008

“Name and Shame”, or socially responsible use of your log data

Your logs contain an ever-growing mass of data on spammers. How about making an effort to make that data useful to others?

Those of us who run email services know, from sometimes painful experience, what it takes to ensure that the minimum possible amount of unwanted advertising and scams that may turn out to be security hazards reaches our users' inboxes.

Email: This should have been very simple
Handling email should really be quite simple: The server is configured to know what domains it receives mail for and what users actually exist in those domains. When a machine makes contact and indicates that it intends to deliver email, the server check if the recipient is a valid user. If the recipient is valid, the message is received and put in the relevant user's mailbox. Otherwise, a message about a failed delivery and optionally the reason for the failure is sent to the user specified as the sender.

If they were all honest people
In each part of the process, the underlying premise is that the communicating parners offer each other correct information. Frequently that is the case, and we have legitimate communications between partners with a valid reason for contacting each other. Unfortunately there are other cases where the implicit trust is abused, such as when email messages are sent with a sender address other than the real one, quite likely a made-up one in a domain that belongs to other people. Some of us occasionally receive delivery failure messages for messages we verfiably did not send[1]. If we take the time to study the contents of those messages, in almost all cases we will find that the messages are spam, sometimes the scamming kind and perhaps part of an attempt to take control of the recipient's computer or steal sensitive data.

What do the ones in charge do, then?
If you ask a typical system administrator what measures are in effect to thwart attempts at delivering unwanted or malicious messages to their users, you will most likely get a description that says, essentially, the messages are filtered through systems that inspect message contents. If the message does not contain anything known to be bad (known spam or malware) or something sufficiently similar to a known bad, the message is delivered to the user's mailbox. If the system determines that the message contents indicates it should not be delivered, the messages is thrown away undelivered, and some system administrators will tell you that the system also sends a message about the decision not to deliver the message to the stated sender address.

Large parts or this is likely part of moderately educated users' passive knowledge, and most of us are likely to accept that content filtering is all we can do to keep dubious or downright criminal elements out of our working environment. For the individual end user, only minor adjustments to this are likely to be possible.

Measures based on observed behavior
But those of us who actually run the service also have the opportunity to study the automatically generated log data from our systems and use spammers' (that is, senders of all types of unwanted mail, including malware) behavior patterns to remove most of the unwanted traffic before actual message content is known. In order to do that, it is necessary to go to a more basic level of network traffic and study sender behavior on the network level.

One of the simpler forms of behavior based measures emerged in the form of a technique called greylisting in 2003. The technique is based on a slightly pedantic and rather creative interpretation of established standards. The Internet protocol for email transfer, SMTP (the Simple Mail Transfer Protocol) allows servers that experience temporary problems that make it impossible to receive mail to report a specific 'temporary local problem' status code to correspondents trying to deliver mail. Correctly configured senders will interpret and act on the status code and delay delivery for a short time. In most circumstances, the delivery will succeed within a short time. It is worth noting that this part of the standard was formulated to help the mail service's reliability. At most times, the retries happen without alerting the person who wrote and sent the message. The messages generally reach their destination eventually.

Lists of grey and black, little white lies
Greylisting works like this: the server reports a temporary local problem to all attempts at delivery from machines the server has not exchanged mail with earlier. Experience shows that the pre-experiment hypothesis was mainly correct: Essentially all machines that try to deliver valid email are configured to check return codes and act on them, while almost all spam senders dump as many messages as possible, and never check any return codes. This means that somewhere in the eighty to high nineties percentage of all spam volume is discarded at the first delivery attempt (before any content filtering), while legitimate email reaches its intended recipients, occasionally with delayed delivery of the initial message from a new correspondent.

One other behavior based technique that predates greylisting is the use of 'blacklists' - lists of machines that have been classified as spam senders - and rejecting mail from machines on such lists. Some groups eventually started experimenting with 'tarpits', a technique that essentially means your end of the communication moves along very slowly. A much cited example is the spamd program, released as a part of the free operating system OpenBSD in May of 2003. The program's main purpose at the time was to answer email traffic from blacklisted hosts one byte per second, never leaving a blacklisted host any real chance of delivering messages.

The combination of blacklists and greylisting proved to work very well, but the quest for even more effective measures continued. Yet again, the next logical step grew out of observing spammer behavior. We saw earlier that spammers do not bother to check whether individual messages are in fact delivered.

Laying traps and bait
By early 2005, these observations lead to a theory that was soon proved useful: If we have one or more addresses in our own domains the are certain to never receive any valid mail, we can be almost a hundred percent certain that any mail addressed to those addresses is spam. The addresses are spamtraps. Any machines that try to deliver spam to those addresses are placed in a local blacklist, and we keep them busy by answering their traffic at a rate of one byte per second. The machines stay on the blacklist for 24 hours unless otherwise specified.

The new technique, dubbed greytrapping was launched as part of the improved spamd in OpenBSD 3.8, released May 2005. In early 2006, Bob Beck, one of the main spamd developers announced that his greytrapping hosts at the University of Alberta generates a downloadable blacklist based on the greyptrap data, updated once per hour, ready for inclusion in spamd setups elsewhere. This is obviously useful. Machines that try to deliver mail to addresses that were never deliverable most likely do not have any valid mail to deliver, and it we are doing society at large a favor by delaying their deliveries and wasting their time to the maximum extent possible.

It is worth mentioning that during the period we have used the University of Alberta blacklist at our site, it has contained a minimum of twenty-some thousand IP addresses, and during some busy periods have reached almost two hundred thousand.

You can help, too
Fortunately you do not need to be a core developer to be able to contribute. The exact same tools Bob Beck uses to generate his blacklist is available to everybody else as part of OpenBSD, and they are actually not very hard to use productively.

Here at BSDdly.net and associated domains we saw during the (Northern hemisphere) summer of 2007 a marked increase in email sent to addresses that have never actually existed in our domains. This was clearly a case of somebody, one or more groups, making up or generating sender addresses to avoid seeing any reactions to the spam they were sending. This in turn lead to us starting an experiment that is still ongoing. We record invalid addresses in our own domains as they turn up in our logs. From these addresses we pick the really improbable ones, put them in our local spamtrap list and publish the list on a specific web page on our server[2].

Experience shows that it it takes a very short time for the addresses we put on the web page to turn up as target addresses for spam. This means that we have succeeded in feeding the spammers data that makes it easier for us to stop their attempts, and frequently we make spam senders use significant amounts of time communicating with our machines with no chance of actually achieving anything. The number of spamtrap addresses has reached fifteen thousand, and we have at times observed groups of machines that spend weeks working through the whole list, with average time spent per unsuccessful delivery attempt clocked at roughly seven minutes.

As a byproduct of the active spammer trapping we started exporting our own list of machines that had been trapped via the spamtrap addresses during the last 24 hours and making the list available for download. This list's existence has only been announced via the spamtrap addresses web page and a few blog posts, but we see that it's retrieved, most likely automatically, at intervals and is apparently used by other sites in their systems.

At this point we have established that it is possible to create a system that makes it very unlikely that spam actually makes it through to users, while at the same time it is quite unlikely that legitimate mail is adversely affected. In other words, we have the cyberspace equivalent of good fences around our property, but spammers are still out there and may create serious probles for those who are without adequate protection.

Collecting evidence, or at least seek clarity
We would have loved to see law enforcement take the spammer problem seriously. This is not just because the spam that reaches its targets is irritating, but rather because almost all spam is sent via equipment that spammers use without the legal owners' consent. We would have liked to see resources allocated in proportion to the criminal activity the spam represents. We would have liked to help, but it might seem that we would not have usable evidence available due to the fact that we do not actually receive the messages the spammers try to deliver. On the other hand, we have at all times a list of machines that have tried to deliver spam, identified with an almost hundred percent certainty based on the spammer trapping addresses. In addition, our systems routinely produce logs of all activity, with the level of detail we set ourselves. This means that it is possible to search our logs for the IP addresses that have tried to deliver spam to our systems during the last 24 hours, and get a summary of what those machines have done.

A search of this kind typically yields a result like this:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3) Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net> Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds. Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks

The first line here states that 190.20.132.16 contacts our system at 02:34:29 AM on August tenth, as the fourth active SMTP connection, three blacklisted. A few seconds later it appears that this is an attempt at delivering a message to the address asasaskosmicki@bsdly.net. That address was already one of our spamtraps, most likely one that was harvested from our logs and was originally made up somewhere else. After 12 seconds, the machine disconnects. The attempted delivery to a spamtrap address means that the machine is added to our local spamd-greytrap blacklist, as indicated in the entry for the next attempt about one hour later. This second attempt lasts for 41 seconds. The third try in our log material happens just after 06:30, and the addition of the list name becks indicates that in the meantime has tried to deliver to one of Bob Beck's spammer trap addresses and has entered that blacklist, too.

Unfortunately, it is unlikely that logs of this kind are sufficient as evidence for criminal prosecution purposes, but the data may be of some use to those who have an interest in keeping machines in their care from sending spam.

“Name And Shame“, or just being neighborly?
After some discussions with colleagues I decided in early August 2008 to generate daily reports of the activities of machines that had made it into the local blacklist on bsdly.net and publish the results. If all we have is the fact that a machine has entered a blacklist as an IP address (such as 24.165.4.190), and there is no supporting material, it is fairly easy for whoever is in charge of that address range to just ignore the entry as an unsupported allegation. We hope that when whoever is responsible for the network containing 24.165.4.190 sees a sequence like this,
Host 24.165.4.190: Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8) Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org> Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds. Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6) Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org> Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds. Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org> Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com> Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood. Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap
they will find that to be a sufficient for action of some kind. The material we generate is available via the “The Name And Shame Robot” web page. The latest complete report of log excerpts is available via links at that page. Previous versions are archived offline, but will be made available on request to parties with valid reasons to request the data.

“The Name And Shame Robot”" is rather new, and it is too early to say what effect, if any, the publication has had. We hope that others will do similar things based on their local log data or even synchronize their data with ours. If you are interested in participating, please make contact.

Regardless of other factors, we hope that the data can be useful as indicators of potential for improvement in the networks that appear regularly in the reports as well as material for studies that will produce even better techniques for spam avoidance.

A shorter version of this article in Norwegian was published in Computerworld's Norwegian edition on August 22, 2008; the longer Norwegian version is available as an earlier blog post.

[1] A collection of such failure messages collected earlier this year is available at http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt.

[2] See http://www.bsdly.net/~peter/traplist.shtml, references at that page lead to my blog, which consists of public field notes, as well as other relevant material.

About the author
Peter N. M. Hansteen (peter@bsdly.net) is a consultant, system administrator and writer, based in Bergen, Norway. In October 2008, he joined FreeCode, the Norwegian free software consultancy. He has written various articles as well as "The Book of PF", published by No Starch Press in 2007, and lectures on Unix- and network-related topics. He is a main organizer of BLUG (Bergen (BSD and) Linux User Group), vice president of NUUG (Norwegian Unix User Group) and an occasional activist for EFF's Norwegian sister organization EFN (Elektronisk Forpost Norge).

Sunday, August 31, 2008

[.NO] “Name and Shame” eller samfunnsnyttig bruk av loggdata om spammere

Today's post is in Norwegian - I'll be back in English later

Vi sitter med stadig voksende mengder med data om spammere. Kan vi bruke dette på en måte som er nyttig for andre?

Vi som selv står for driften av eposttjenester vet av tidvis smertelig erfaring hva som skal til for at minimalt med uønsket reklame og lureri som kan være direkte trusler mot sikkerheten faktisk havner i innboksene til brukerne våre.

Epost: Dette burde vært enkelt
I utgangspunktet burde det være en enkel sak å håndtere epost: Serveren er satt opp slik at den vet hvilke domener den skal ta imot post for, og hvilke brukere som eksisterer i domenene. Når en maskin tar kontakt og signaliserer at den ønsker å levere epost, sjekker serveren om meldingen er adressert til en gyldig bruker. Er det snakk om en gyldig bruker, blir meldingen mottatt og lagt i innboksen til den aktuelle brukeren, i motsatt fall får den som er oppgitt som avsender beskjed om at meldingen ikke lot seg levere, og hvorfor.

Hvis bare alle var ærlige
I hvert enkelt ledd av denne prosessen er det en underliggende forutsetning at kommunikasjonspartnerne oppgir korrekt informasjon. I mange tilfeller er det slik, og det dreier seg om legitim kommunikasjon mellom parter som har grunn til å ønske kontakt. Dessverre finnes det også tilfeller der denne grunnleggende tilliten blir brutt eller misbrukt, for eksempel når epostmeldinger blir sendt med annen avsenderadresse enn den reelle, gjerne noe oppdiktet i et domene som tilhører andre. En del av oss har også opplevd å få returmeldinger om at levering ikke var mulig på grunnlag av meldinger som vi vitterlig ikke har sendt[1]. Når vi ser nøyere på innholdet i disse meldingene, vil vi i nesten alle tilfeller se at dette er søppelpost, tidvis med innslag av svindel og kanskje ledd i førsøk på å overta mottakerens maskin eller stjele sensitive data.

Hva gjør de ansvarlige?
Hvis du spør en typisk systemadministrator om hvilke tiltak som er satt i verk for å hindre at uønskede eller skadelige meldinger faktisk kommer frem til brukerne, vil du antakelig få høre en beskrivelse som stort sett kan kokes ned til at posten blir filtrert gjennom systemer som studerer innholdet i meldingene. Hvis meldingen ikke inneholder noe som er kjent som uønsket (kjent spam eller skadelig programvare) eller noe som likner på noe som kunne være det, blir meldingen levert til brukeren den er adressert til. Hvis systemet avgjør at meldingen har innhold som gjør at den ikke skal leveres til mottaker, blir meldingen i mange tilfeller kastet uten å bli levert, og noen systemadministratorer vil nok også fortelle deg at systemet sender melding om avgjørelsen tilbake til adressen som er oppgitt som avsender.

Mye av dette hører antakelig til den passive kunnskapen hos mange, og de fleste slår seg til ro med at innholdsfiltrering er det eneste vi kan gjøre for å holde tvilsomme eller direkte kriminelle elementer unna arbeidsmiljøet. For en enkelt sluttbruker er det sannsynligvis bare mindre justeringer ut fra dette som er mulig.

Tiltak basert på observert adferd
Men for oss som håndterer driften av selve tjenesten er det mulig å studere dataene som registreres automatisk i loggene våre og bruke spammernes (her brukt om avsendere av alle typer uønsket epost, inkludert skadevare) adferd til å fjerne mesteparten av den uønskede trafikken før innholdet i meldingene er kjent. Det er nødvendig å gå ned på et noe mer grunnleggende nivå i nettverkstrafikken og studere adferden på nettverksnivå.

En av de enkleste formene av slike adferdsbaserte tiltak dukket opp i form av en teknikk som fikk navnet grålisting (greylisting) i 2003. Teknikken bygger på en litt pedantisk og ganske kreativ tolkning av allerede vedtatte standarder. Protokollen som brukes for epost-overføring på Internett, SMTP (Simple Mail Transfer Protocol) inneholder en mulighet for at en server som har antatt forbigående problemer med å ta imot epost kan rapportere om tilstanden ved å svare med en spesiell feilkode når andre maskiner prøver å levere. Hvis avsendermaskinen er korrekt konfigurert, vil den vente en viss tid før den gjør et nytt forsøk på levering, og etter all sannsynlighet vil den lykkes etter kort tid. Det er verd å merke seg at dette er en del av standarden som først og fremst skulle sørge for at eposttjenesten skulle være så pålitelig som mulig, og i de fleste tilfeller skjer alt dette uten at personen som skrev og sendte meldingen merker noe til det. Meldingene kommer frem, og alle er fornøyde.

Grå og svarte lister, små hvite løgner
Grålisting går ut på at serveren gir melding om midlertidig feil til alle forsøk på epostlevering fra maskiner den ikke har hatt kontakt med før. Erfaringene viste at antakelsene fra før eksperimentet i hovedsak var korrekte: De aller fleste maskiner som sender legitim epost er satt opp til å sjekke returkoder og handle etter dem, mens de aller fleste avsendere av spam bare pøser på så mange meldinger som mulig, uten å sjekke returkoder. Resultatet er at et sted mellom åtti og noenognitti prosent av spam-mengden blir stoppet på første forsøk (før eventuell innholdsfiltrering), mens legitim post kommer frem, i noen tilfeller med en viss forsinkelse ved første kontakt.

En annen adferdsbasert teknikk, som faktisk var i bruk før grålisting ble utbredt, er å bruke såkalte svartlister - lister over maskiner som er blitt klassifisert som spam-avsendere - og avvise post fra maskiner som var med i listen. Noen grupper begynte etterhvert å eksperimentere med såkalte tjærehull (tarpit), der teknikken går ut på å forsinke trafikk fra maskiner som er med i en svartliste ved å la sin del av kommunikasjonen gå svært sakte. Et eksempel som ofte nevnes er programmet spamd, som det frie operativsystemet OpenBSD lanserte i mai 2003. Programmet hadde da som sin hovedoppgave å svare på eposttrafikk fra svartlistede maskiner med ett tegn i sekundet, uten at avsendere som var med i en svartliste hadde noen reell mulighet til å få levert meldingene.

Kombinasjonen av svartlister og grålisting har vist seg å fungere utmerket. Likevel fortsetter praktikere og utviklere forsøkene på å få til enda mer effektive teknikker. Enda en gang kom neste logiske steg som resultat av observert adferd. Vi så tidligere at spammere ikke bryr seg om å sjekke om hver enkelt melding faktisk kommer frem.

Legge ut snarer og agn
Tidlig i 2005 førte dette til at det ble det formulert en teori som det viste seg å være hold i: Hvis vi så lager en eller flere adresser i vårt eget domene som vi vet aldri vil ha noen grunn til å få legitim post, så kan vi med nesten hundre prosent sikkerhet vite at post som blir forsøkt levert til disse adressene er spam. Adressene er spammerfeller. Maskiner som prøver å levere spam til disse adressene, kan vi legge i en lokal svartliste som vi så oppholder med ett tegn i sekundet. Maskinene blir liggende i svartlisten i 24 timer dersom ikke annet tilsier det.

Den nye teknikken fikk navnet greytrapping, og ble lansert som del av en forbedret spamd i OpenBSD 3.8 i mai 2005. Bob Beck, som var sentral deltaker i utviklingen av spamd, annonserte tidlig i 2006 at han brukte greytrapping på sentralt plasserte maskiner ved University of Alberta, og gjorde svartlistene som er resultat av fangsten, med oppdateringer hver time, tilgjengelig for nedlasting slik at andre kan bruke listene i sine oppsett. Dette er åpenbart nyttig, maskiner som sender til adresser som aldri har vært leverbare, har etter all sannsynlighet ikke legitim epost å levere, og vi gjør samfunnet en tjeneste ved å hindre dem i leveringen og å få dem til å kaste bort så mye tid som mulig.

Det er kanskje verd å nevne at svartlisten fra University of Alberta så lenge undertegnede har fulgt den har inneholdt minst noenogtyvetusen IP-adresser, og har i travle perioder vært oppe i nesten tohundretusen maskiner.

Også du kan bidra
Likevel er det ikke nødvendig å være sentralt plassert programvareutvikler for å kunne bidra positivt. De samme verktøyene som Beck bruker til å generere sin svartliste er tilgjengelige for alle som del av OpenBSD, og er ikke så vanskelig å bruke som man kunne frykte.

Her på bsdly.net og beslektede domener observerte vi sommeren 2007 en markert økning i meldinger om ikke leverbar epost til epostadresser som aldri har eksistert i noen av våre domener. Her var det helt klart snakk om at noen, en eller flere grupper, genererte eller fant på avsenderadresser for å unngå å få reaksjoner på spammen sin tilbake til seg. Dette førte i sin tur til et eksperiment som vi fortsatt har gående. Vi registrerer ugyldige adresser i våre egne domener som dukker opp i loggene våre. Av disse adressene velger vi ut de helt usannsynlige, legger dem inn i vår lokale spammerfelle-liste og legger ut listen på en egen side på webserveren vår[2].

Erfaringene viser at det tar svært kort tid før adressene vi fører opp på denne siden dukker opp som mottakeradresser. Kort og godt har vi klart å fore spammere med data som gjør det enklere for oss å stoppe dem, og i mange tilfeller får vi i tillegg spamsenderne til å bruke betydelige mengder tid på å kommunisere med våre maskiner uten å oppnå noe som helst. Antallet spammerfelle-adresser i vår liste er nå oppe i rundt femtentusen, og vi har tidvis observert grupper av maskiner som bruker noen uker på å arbeide seg gjennom hele listen, med gjennomsnittlig noe under syv minutter brukt per mislykkede leveringsforsøk.

Som et biprodukt av denne aktive spammerfangingen begynte vi å eksportere vår egen liste over maskiner som har blitt fanget via spammerfelle-adressene de siste 24 timer og legge den ut tilgjengelig for nedlasting. At denne listen finnes, er kun annonsert via siden med spammerfelle-adressene og noen bloggposter, men vi ser at den blir hentet regelmessig og antakelig automatisk av andre som bruker den i sine systemer.

Så langt har vi etablert at det er mulig å lage et system som gjør at sannsynligheten for at spam kommer gjennom til brukere er svært liten, samtidig som det er nesten helt usannsynlig at legitim post blir merkbart hindret. Dermed har vi det som tilsvarer gode gjerder rundt egen eiendom, men spammerne finnes fortsatt der ute og er et potensielt alvorlig problem for de som ikke har tilstrekkelig beskyttelse mot dem.

Samle bevis, eller i det minste skape klarhet
Aller helst hadde vi ønsket at politi og påtalemyndigheter hadde tatt spammerproblemet alvorlig. Dette ikke bare fordi den spammen som kommer frem er irriterende å se, men fordi nesten all spam sendes via utstyr som spammerne bruker uten eiernes samtykke. Kort og godt ønsker vi at det blir satt inn ressurser som står i forhold til den kriminelle virksomheten spammen representerer. Vi ville gjerne hjelpe til, men i utgangspunktet kan det virke som vi kan ha problem med å skaffe til veie brukbart bevismateriale siden vi ikke mottar meldingene som spammerne forsøker å levere. På den annen side har vi til enhver tid en liste over maskiner som har prøvd å levere spam, noe nær hundre prosent sikkert identifisert på grunnlag av spammerfelle-adressene. I tillegg produserer systemene våre rutinemessig logger over all aktivitet, med det detaljnivået vi selv velger. Dermed går det an å søke i loggene etter IP-adressene som har forsøkt å levere spam til oss siste 24 timer, og få oversikt over hva maskinene har foretatt seg.

Resultatet av et typisk søk av denne typen ser slik ut:
Aug 10 02:34:29 skapet spamd[13548]: 190.20.132.16: connected (4/3) Aug 10 02:34:41 skapet spamd[13548]: (GREY) 190.20.132.16: <kristie@iland.net> -> <asasaskosmicki@bsdly.net> Aug 10 02:34:41 skapet spamd[13548]: 190.20.132.16: disconnected after 12 seconds. Aug 10 03:41:42 skapet spamd[13548]: 190.20.132.16: connected (14/13), lists: spamd-greytrap Aug 10 03:42:23 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap Aug 10 06:30:35 skapet spamd[13548]: 190.20.132.16: connected (23/22), lists: spamd-greytrap becks Aug 10 06:31:16 skapet spamd[13548]: 190.20.132.16: disconnected after 41 seconds. lists: spamd-greytrap becks
Den første linjen angir at 190.20.132.16 tar kontakt med vårt system klokken 02.34.29 om morgenen tiende august, som fjerde aktive forbindelse, derav tre svartlistede. Noen sekunder senere blir det klart at dette er et forsøk på å levere en melding til adressen asasaskosmicki@bsdly.net, som er blant de vi har som spammerfelle, sannsynligvis høstet fra logger og diktet opp et annet sted i verden. Etter 12 sekunder kobler denne maskinen fra. Forsøket på å levere til en spammerfelle gjør at maskinen blir oppført i vår lokale svartliste, spamd-greytrap, noe som vises klart når maskinen prøver igjen litt mer enn en time senere. På dette forsøket blir den oppholdt i 41 sekunder. Det tredje forsøket i vårt loggmateriale skjer like etter 06.30, og at listenavnet becks har kommet til, viser at maskinen i mellomtiden har forsøkt å levere til en av Bob Becks spammerfelle-adresser og nå er med i også den svartlisten.

Det er dessverre lite sannsynlig at slike logger er tilstrekkelige som bevismateriale i straffesaker, men for de som har interesse av enten å sørge for at maskinene de administrerer i så liten grad som mulig brukes til spamutsendelse eller de som har interesse av spammeradferd er dette nyttige data.

“Name And Shame”, eller kanskje bare godt naboskap?
Etter noen diskusjoner med kolleger bestemte jeg meg tidlig i august 2008 for å generere daglige oversikter over aktivitetene til maskiner som har kommet inn i vår lokale svartliste på bsdly.net og legge dem ut offentlig. Når en maskin er oppført i en svartliste med bare IP-adresse (for eksempel 24.165.4.190), uten annet materiale om oppføringen, er oppføringen mest en påstand som de ansvarlige godt kan velge å ikke tro på. Vårt håp er at om den som er ansvarlig for nettverket der 24.165.4.190 hører hjemme ser en sekvens som denne,
Host 24.165.4.190: Aug 10 02:57:40 skapet spamd[13548]: 24.165.4.190: connected (9/8) Aug 10 02:57:54 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 02:57:55 skapet spamd[13548]: (GREY) 24.165.4.190: <hand@itnmiami.com> -> <kimberliereffett@ehtrib.org> Aug 10 02:57:56 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds. Aug 10 02:58:16 skapet spamd[13548]: 24.165.4.190: connected (8/6) Aug 10 02:58:30 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 02:58:31 skapet spamd[13548]: (GREY) 24.165.4.190: <brunson@jebconet.com> -> <kimberliereffett@ehtrib.org> Aug 10 02:58:32 skapet spamd[13548]: 24.165.4.190: disconnected after 16 seconds. Aug 10 02:58:39 skapet spamd[13548]: 24.165.4.190: connected (7/6), lists: spamd-greytrap Aug 10 03:02:24 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberlee.ledet@ehtrib.org> Aug 10 03:03:17 skapet spamd[13548]: (BLACK) 24.165.4.190: <aarnq@abtinc.com> -> <kimberliereffett@ehtrib.org> Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: From: "Preston Amos" <aarnq@abtinc.com> Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: To: kimberlee.ledet@ehtrib.org Aug 10 03:05:01 skapet spamd[13548]: 24.165.4.190: Subject: Wonderful enhancing effect on your manhood. Aug 10 03:06:04 skapet spamd[13548]: 24.165.4.190: disconnected after 445 seconds. lists: spamd-greytrap

så er det tilstrekkelig grunnlag for å gjøre noe aktivt. Materialet er tilgjengelig via The Name And Shame Robot-siden på http://www.bsdly.net/~peter/nameandshame.html. Siste genererte loggoversikt er tilgjengelig via referanser på den siden, tidligere utgaver blir arkivert, men vil være tilgjengelige ved godt begrunnet forespørsel.

The Name and Shame Robot er såpass ny at vi ikke kan si spesielt mye om effekten av offentliggjøringen. Det er lov å håpe på at andre vil gjøre noe tilsvarende ut fra sine lokale loggdata eller kanskje til og med synkronisere sine data med våre. Ta gjerne kontakt hvis du er interessert i dette arbeidet.

Uavhengig av alt annet håper vi at dataene kan være nyttige, både som påpeking av forbedringspotensiale for de nettverkene som opptrer jevnlig i oversiktene og som materiale for studier som kan gi oss enda bedre spambekjempelse.

Noter

[1] En samling av slike returmeldinger fra tidligere i år kan beskues på http://www.bsdly.net/~peter/joejob-archive.2008-07-28.txt

[2] http://www.bsdly.net/~peter/traplist.shtml, referanser på den siden fører videre til bloggen min som jeg bruker til offentlige notater, og annet relevant materiale.

En forkortet utgave av denne artikkelen ble trykt i Computerworld Norge 22. august 2008.

Wednesday, August 27, 2008

Logfiles in the buff

Search engine optimization, deflowered.

Logs are important. Depending on the specific kind of log, the data may shape lives and generate fortunes (how many times were those ads displayed, your clickthrough rate), reveal suspicious behavior and trigger actions (such as shutting the door to that bruteforcer) or provide sysadmins such as yours truly a general idea of what works and not or anything inbetween.

If you're a sysadmin, log data or log data derivatives such as a monitoring tool's graphical status display is more likely than not an important underlying factor to determine how you spend your day.

Then of course most of the material for these columns comes from log files, too. Depending on the specific log file, I tend to either just peek at the data my monitoring scripts offer me or do some manual greping for any patterns that interest me.

One such pattern matches the filename for my resume. I put that online for job hunting purposes, and now that I'm basically a gun for hire, it's slightly interesting to see any activity involving that file.

So at semi-random intervals, I check the apache log for references to my resume. Today, the grepery turned up this nugget
92.48.107.33 - - [27/Aug/2008:04:41:12 +0200] "GET /%7Epeter/PNMH-cv.html HTTP/1.0" 200 12318 "http://afmfokuv.fcpages.com/hot-anime-lesbians.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
and I count myself lucky that I had thoroughly swallowed my last mouthful of coffee before reading that.

In the Era of PageRank, the Age Search Engine Optimization Consultant, Season of the Clickthrough rage, I suppose we should not be entirely surprised to see such things. Just what the two documents have in common, perhaps other than targeting a very specific market, is left as an excercise for the terminally curious. I would advise some caution in choice of browser and operating system if your research takes you to the referring URL. One of the lessons of the day is, it doesn't always take a spamd log to crack you up.

PF tutorial in London, November 26
In other news, the UKUUG are hosting a full day PF tutorial featuring yours truly in London on November 26th, 2008. See the UKUUG web site for details. OpenCON is the following weekend in Venice, and I hope to make it there too.

The Name and Shame Robot
Last week the Norwegian edition of Computerworld published an article about the Name and Shame Robot, unfortunately in the paper edition only (yes, I've got an English article in process too). The article did spur some nameandshame.html traffic from unexpected places, but no offers of cooperation or spamd synchronization so far. In the meantime, I'm running into odd cron behavior differences when trying to run the generator script I wrote on my OpenBSD machines on a few FreeBSD hosts. More than likely there is a lesson to be learned there too.

Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Some times we just need facts on the table, automated.

In my previous blog post, I wondered aloud about publishing data about the machines that verifiably tried to spam us. The response was other than overwhelming, and with the script running once per day anyway, I now publish the results via the Name And Shame Robot page.

The annoucement below is very close to the text there, so by way of explanation, here is a gift to all my fellow spamd junkies out there:

We started actively greytrapping and publishing our list of greytrap addresses (almost exclusively addresses generated or made up elsewhere and harvested from our logs) during July 2007. The list of greytrap addresses is published on the Traplist page along with some commentary. You can find related comments in this blog post and its followups.

One byproduct of the greytrapping is a list of IP addresses that has tried to deliver mail to one or more of our greytrap addresses during the last 24 hours. The reasoning is, none of these addresses are valid, and any attempts at delivering to those addresses is more likely than not spam. You can download that list here as a raw list of IP addresses, or as a DNS zone file intended as a DNS blacklist here.

In early August 2008, I wrote a small script that copies (rsyncs, actually) the current list of trapped IP addresses as well as the spamd log off the firewall and for each IP address collects all log entries from the spamd log. The resulting file is rsynced back to the webserver, and you can view the latest version here.

The material here is useful mainly to the system administrators responsible for the machines that appear in it, or people who are interested in studying spammer or spambot behavior. Times are given according to the Europe/Oslo time zone (CET or CEST according to season), and if a date appears several times for an IP address entry, the reason is simply that the log data spans several years. The default syslog settings do not record the year.

In the data you will find several kinds of entries, most of them are pretty obvious and straightforward, others less so. The likely FAQ is, "what are the entries with no log data?". The answer is, the spamd here synchronizes with a spamds at other sites. The entries without log data entered our traplist through a sync operation, but the host did not attempt direct contact here.

The other likely question is, "what is that becks list?". It's what the rest of the world refers to as uatraps. I copied the data for that list into my config from Bob Beck's message on OpenBSD-misc and didn't notice that the list had an official name until much later.

Please note that this is not an up-to-the minute list. Depending on the number of hosts currently in the list of trapped addresses, the script's run time could be anything up to several hours. For that reason, the script starts at the time stated at the beginning of the report file and runs until it finishes generating. The last thing the script does is to rsync the report file to the webserver. For the time being, I archive older versions off-line.

This is now a totally hands-off, automated operation. The report is currently generated on a Pentium IV-class computer with few and only occasional other duties. If you have any comments or concerns, the address in the next sentence is the one I use for day to day email. If you find this data useful, donations of faster hardware or money (paypal to peter@bsdly.net or contact me for bank information) is of course welcome.

Thursday, August 7, 2008

Now that we have their addresses, do we name and shame?

The legal owners of botnet-controlled spam senders are quite likely unaware what their machines are doing. Do they deserve to be outed, named and shamed?

Earlier this week a friendly Australian who I think had been reading my blog sent me a few questions about spam, spammers and what to do with them. Would it for example be useful to forward the IP addresses in the local traplist to law enforcement? After all, I publish a dump of IP addresses from my local-greytrap once per hour, and apparently at least some people are fetching and using that as a valid blacklist on a regular basis.

(On a side note: if you do fetch that list regularly, keep in mind that the data is dumped ten past every hour, that's when the data is fresh. If you fetch at every full hour, the data is already fifty minutes old).

Anyway, my initial reaction to the question about forwarding the list of IP addresses to law enforcement was, along the lines of "Well, a raw list of IP addresses doesn't really add up to a lot of evidence, but if you can extract the log entries for each one, you may have something". My actual answer was phrased a little differently, but even while I was writing my reply I started fiddling with a script to read my list of trapped IP addresses and grep the spamd log for all entries for each IP address.

My complete collection of spamd logs goes back a few years, so searching for a complete history does take a while. (For techies: for each IP address, a grep of the entire log takes at least a few seconds (s) , total time is s is times number of entries (N), typically a few thousand, and grepping in parallel is difficult, because you want the output per IP address, not interlaced like in the raw log data).

After a while, you can see output roughly like this:

Host 81.183.80.187: Aug 7 07:24:16 skapet spamd[13548]: 81.183.80.187: connected (12/9) Aug 7 07:24:30 skapet spamd[13548]: (GREY) 81.183.80.187: <akstcabrushwithafricamnsdgs@abrushwithafrica.com> -> <bennett-gauvin@ehtrib.org> Aug 7 07:24:31 skapet spamd[13548]: 81.183.80.187: disconnected after 15 seconds. Aug 7 07:24:44 skapet spamd[13548]: 81.183.80.187: connected (9/7) Aug 7 07:25:06 skapet spamd[13548]: (GREY) 81.183.80.187: <akstcamplepleasuremnsdgs@amplepleasure.net> -> <bennett-gauvin@ehtrib.org> Aug 7 07:25:07 skapet spamd[13548]: 81.183.80.187: disconnected after 23 seconds. Aug 7 07:25:08 skapet spamd[13548]: 81.183.80.187: connected (11/9) Aug 7 07:25:23 skapet spamd[13548]: (GREY) 81.183.80.187: <akstcaesamnsdgs@aesa.ch> -> <bennett-gauvin@ehtrib.org> Aug 7 07:25:24 skapet spamd[13548]: 81.183.80.187: disconnected after 16 seconds. Aug 7 07:26:16 skapet spamd[13548]: 81.183.80.187: connected (11/9), lists: spamd-greytrap Aug 7 07:30:00 skapet spamd[13548]: (BLACK) 81.183.80.187: -> <bennett-gauvin@ehtrib.org> -> <bennett-gauvin@ehtrib.org> Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: From: "Frances Ballard" -> <bennett-gauvin@ehtrib.org> Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: To: <bennett-gauvin@ehtrib.org> Aug 7 07:31:43 skapet spamd[13548]: 81.183.80.187: Subject: Extraordinary Narcotic Deals Aug 7 07:32:47 skapet spamd[13548]: 81.183.80.187: disconnected after 391 seconds. lists: spamd-greytrap

That's rougly what I would have expected to see: A host tries to send obvious spam to one of the trap addresses (one I harvested from incoming noise earlier), is added to spamd-greytrap and on the next attempts gets stuck for a few minutes. (Notice that this spammer has another version of grepable From: addresses - prepend akstc and append mnsdgs to the basename so abrushwithafrica.com becomes the junk address akstcabrushwithafricamnsdgs@abrushwithafrica.com. Content and header filterers, please take note.) I thought that this would be the typical behavior, but browsing the output from my script, entries of this kind seems to be more of the norm:
Host 81.192.185.9: Aug 6 12:47:15 skapet spamd[13548]: 81.192.185.9: connected (12/8) Aug 6 12:47:27 skapet spamd[13548]: (GREY) 81.192.185.9: <jacowen@teaneckschools.org> -> <hevadcouture@bsdly.net> Aug 6 12:47:27 skapet spamd[13548]: 81.192.185.9: disconnected after 12 seconds.
Here, the spambot tries exactly once, never to return. It's possible they detect the stuttering (our side answers one byte per second for the first ten seconds) and give up for that reason, but it could equally well be that it's classic fire-and-forget, the reason why greylisting still works. Or both, for that matter.

But back to the real question: Now that we have the data, what do we do with it?

With the script I have now, extracting the history for each of several thousand IP addresses takes some hours. The output is enlightening, but by the time the run is complete, it could be significantly more than twenty-four hours since the machines listed tried to send spam.

Should we name and shame anyway? If we forward the data to law enforcement, would they care?

For the time being, I'll try to think of a quicker way to extract the data. Any input on how to make the process more efficient is welcome, as is considered (learned or otherwise) opinion on the ethical up- or downside of publishing spamd log data.

Wednesday, July 2, 2008

Is there really a market for an open source router?

Open source goodness. Coming soon to a router near you (if it isn't there already).

I have a confession to make. Today's headline isn't mine. I snatched it from Dana Blankenhorn's June 30th piece over at ZDNet. It almost made me utter a Simpsonian grunt and start ranting about my more than 40,000 visitors again. Maybe my readers don't constitute a market, and in a consumerland context, a mere forty thousand (they've been coming in at a rate of about five hundre new uniques a week for a little while now) is possibly small potatoes indeed.

On the other hand, there are good indications that significant parts of the Internet actually runs on open source in some form, regardless of sundry punditry or for that matter how many people have found my online or printed work. And then again, if even a small subset of those who have downloaded my work actually do some of the things I write about, there is reason to believe that they have achieved a degree of insulation between any local stupidity and the Internet at large.

But back to the ZDNet piece. The interesting news there is that Netgear are apparently coming around to support open source via the MyOpenRouter web site and at least one wireless router appliance with firmware source code available. It will be kind of interesting to see if they've actually made their code and specifications open enough that we have a reasonable chance of seeing non-Linux open source systems such as OpenBSD run on the platform.

If you read the OpenBSD source-changes list you probably know this already, but even if you don't, OpenBSD just turned -current into 4.4-beta (see Theo's commit message). I take that to mean that the various significant changes such as the overhaul of the PF code will be thoroughly tested in time for the 4.4 release. That change hasn't made it into snapshots just yet (but likely will witin the next few hours), but you can take a peek at the OpenBSD change log for a preview of the goodies that will be officially released on November 1st. I for one am looking forward to that date.

Wednesday, June 25, 2008

Yes, we can! Make a difference, that is

Good netizenship sometimes comes with a green tinge.

Taking in my daily Linuxtoday dose this morning, there was one item grabbed that my attention, with the headline "Botnets and You: Save the World--Install Linux", and the Linuxtoday entry in turn points to Ross Brunson's blog post with the same title. Do click the link to Ross' blog, it's well worth reading.

What I particularly like about the piece is that he makes the point that you can actually make a difference. More specifically, if you run Linux (being a Novellian, he naturally recommends SLES or SLED) and eliminate Microsoft from your system, you are not only gaining for yourself a safer and more reliable platform, you are also helping everybody else by making the probability of your machine ever joining a botnet a lot smaller.

As regular readers here will recognize, I rate being a good netizen (aka net citizen) as extremely important. Let others get on with their business while we tend to our own tasks, not interfering unless we really have to. If you opt to run your day to day business on the same software your machine most likely came with, the likelihood that somebody else will be taking control of your machine and using it for less than desirable purposes is in fact anything but negligible. I could have used stronger words ("reckless endangerment" comes to mind), but then Redmondians would have just shut off all remnants of rationality. I have argued earlier (article in Norwegian only, sorry) that a computer owner's responsibility should be roughly on par with a dog owner's, but it's possible I should return to that in a future column. And besides, any Linux I've touched for the last ten years is easier to install and operate than the Microsoft offering.

If you followed the Linuxtoday link earlier, you know that I could not resist making the suggestion there that it is in fact possible to be an even better netizen. As outlined in an earlier column (and its followups), if you do your greytrapping properly, you can keep the bad guys occupied and have fun at the same time, consuming next to no resources locally. How's that for green computing.

For example, the crew who started sending messages with headers like

From: "Mrs Maria Jose" <cordinator@euros-ukolympics.com>
Subject: Immediate Response Required.(Euro Award)

to various spamtrap addresses on May 15th are still patiently trying to deliver. A very superficial log analysis shows that there were originally four hosts sending those messages from the 217.70.178.0/24 network. There appears to be only one left now, but collectively these machines have so far made 476,787 attempts at delivery to my data collection points. Judging from a sample of some 21,000 connections from one of the hosts, the average connection time was 389.68 seconds, which in turn means that we've had those spam senders waste approximately 185792441 seconds, or time equal to 5.89 years.

Not bad in a little more than a month. On the downside, the predictions that spambots would sooner or later learn to do things in parallel have been proved true. My logs indicate that the current crop is able to handle at least sixty simultaneous delivery attempts. Even bogged down by a suboptimal operating system at the sending end, modern desktop computers are in fact powerful beasts. In my book it's just good netizenry to set up a machine to keep the garbage they send off your own network, and by extension off others since they don't get around to try delivering to others. By the way, that list is now almost 15,000 addresses long, all non-deliverable garbage. You could be excused for thinking it a twisted art project.

That grumpy BSD guy

Monday, September 22, 2008

“Name and Shame”, or socially responsible use of your log data

Sunday, August 31, 2008

[.NO] “Name and Shame” eller samfunnsnyttig bruk av loggdata om spammere

Wednesday, August 27, 2008

Logfiles in the buff

Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Thursday, August 7, 2008

Now that we have their addresses, do we name and shame?

Wednesday, July 2, 2008

Is there really a market for an open source router?

Wednesday, June 25, 2008

Yes, we can! Make a difference, that is

About Me

Buy The Book of PF

Post, tweet, follow

Upcoming Talks

Blog Archive

Follow me on Twitter

Twitter Updates

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)

Monday, September 22, 2008

Sunday, August 31, 2008

Wednesday, August 27, 2008

Saturday, August 9, 2008

Thursday, August 7, 2008

Wednesday, July 2, 2008

Wednesday, June 25, 2008

About Me

Buy The Book of PF

Post, tweet, follow

Upcoming Talks

Blog Archive

Subscribe To Posts From Here

Follow me on Twitter

Twitter Updates

Friends

Links to other nice sites

Popular Posts

Total Pageviews

Amazon Deals (#ad)

Amazon Deals (#ad)