Sunday, September 25, 2022

A Few of My Favorite Things About The OpenBSD Packet Filter Tools

The OpenBSD packet filter PF was introduced a little more than 20 years ago as part of OpenBSD 3.0. We'll take a short tour of PF features and tools that I have enjoyed using.



NOTE: If you are more of a slides person, the condensate for a SEMIBUG user group meeting is available here. A version without trackers but “classical” formatting is available here.

At the time the OpenBSD project introduced its new packet filter subsystem in 2001, I was nowhere near the essentially full time OpenBSD user I would soon become. I did however quickly recognize that even what was later dubbed “the working prototype” was reported to perform better in most contexts than the code it replaced.

The reason PF's predecessor needed to be replaced has been covered extensively by myself and others elsewhere, so I'll limit myself to noting that the reason was that several somebodies finally read and understood the code's license and decided that it was not in fact open source in any acceptable meaning of the term.

Anyway the initial PF release was very close in features and syntax to the code it replaced. And even at that time, the config syntax was a lot more human readable than the alternative I had been handling up to then, which was Linux' IPtables. The less is said about IPtables, the better.

But soon visible improvements in user friendliness, or at least admin friendliness, started turning up. With OpenBSD 3.2, the separate /etc/nat.conf network adress translation configuration file moved to the attic and the NAT and redirection options moved into the main PF config file /etc/pf.conf.

The next version, OpenBSD 3.3, saw the ALTQ queueing configuration move into pf.conf as well, and the previously separate altq.conf file became obsolete. What did not change, however, was the syntax, which was to remain just bothersome enough that many of us put off playing with traffic shaping until some years later. Other PF news in that release included anchors, or named sub-rulesets, as well as tables, described as "a very efficient way for large address lists in rules" and the initial release of spamd(8), the spam deferral daemon.

More on all of these things later, I will not bore you with a detailed history of PF features introduced or changed in OpenBSD over the last twenty-some years.

PF Rulesets: The Basics

So how do we go about writing that perfect firewall config?

I could go on about that at length, and I have been known to on occasion, but let us start with the simplest possible, yet absolutely secure PF ruleset:

block

With that in place, you are totally secure. No traffic will pass.

Or as they say in the trade, you have virtually unplugged yourself from the rest of the world.

By way of getting ahead of ourselves, that particular ruleset will expand to the following:

block drop all

But we are getting ahead of ourselves.

To provide you with a few tools and some context, these are the basic building blocks of a PF rule:

verb criteria action ... options

Here are a few sample rules to put it into context, all lifted from configurations I have put into production:

pass in on egress proto tcp to egress port ssh

This first sample says that if a packet arrives on the egress — an interface belonging to the group of interfaces that has a default route — and that packet is a TCP packet with a destination service ssh, let the packet pass to the interfaces belonging to the egress interface group.

Yes, when you write PF rulesets, you do not necessarily need to write port numbers for services and memorize what services hide behind port 80, 53 or 443. The common or standard services are known to the rules parsing part of pfctl(8), generally with the service names you can look up in the /etc/services file.

The interface groups concept is as far as I know an OpenBSD innovation. You can put interfaces into logical groups and reference the group name in PF configurations. A few default interface groups exist without you doing anything, egress is one, another common one is wlan where all configured WiFi interfaces are members by default. Keep in mind that you can create your own interface groups — set them up using ifconfig(8) — and refer to them in your rules.

match out on egress nat-to egress

This one matches outbound traffic, again on egress (which in the simpler cases consists of one interface) and applies the nat-to action on the packets, transforming them so that the next hops all the way to the destination will see packets where the source address is equal to the egress interface's address. If your network runs IPv4 and you have only one routeable address assigned, you will more than likely have something like this configured on your Internet-facing gateway.

It is worth noting that early PF versions did not have the match verb. After a few years of PF practice, developers and practitioners alike saw the need for a way to apply actions such as nat-to or other transformations without making a decision on whether to pass or block the traffic. The match keyword arrived in OpenBSD 4.6 and in retrospect seems like a prelude to more extensive changes that followed over the next few releases.

Next up is a variation on the initial absolutely secure ruleset.

block all

I will tell you now so you will not be surprised later: If you had made a configuration with those three rules in that order, your configuration would be functionally the same as the one word one we started with. This is because in PF configurations, the rules are evaluated from top to bottom, and the last matching rule wins.

The only escape from this progression is to insert a quick modifier after the verb, as in

pass quick from (self)

which will stop evaluation when a packet matches the criteria in the quick rule. Please use sparingly if at all.

There is a specific reason why PF behaves like this. The system that PF replaced in OpenBSD had the top to bottom, last match wins logic, and the developers did not want to break existing configurations too badly during the transition away from the old system.

So in practice you would put them in this order for a more functional setup,

  block all
  match out on egress nat-to egress
  pass in on egress proto tcp to egress port ssh
    

but likely supplemented by a few other items.

For those supplementing items, we can take a look at some of the PF features that can help you write readable and maintainable rulesets. And while a readable ruleset is not automatically a more secure one, readability certainly helps spot errors in your logic that could put the systems and users in your care in reach of potential threats.

To help that readability, it is important to be aware of these features:

Options: General configuration options that set the parameters for the ruleset, such as

  set limit states 100000
  set debug debug
  set loginterface dc0
  set timeout tcp.first 120 
  set timeout tcp.established 86400 
  set timeout { adaptive.start 6000, adaptive.end 12000 }
  

If the meaning of some of those do not seem terribly obvious to you at this point, that's fine. They are all extensively documented in the pf.conf man page.

Macros: Content that will expand in place, such as lists of services, interface names or other items you feel useful. Some examples along with rules that use them:

  ext_if = "kue0" 
  all_ifs = "{" $ext_if lo0 "}" 
  pass out on $ext_if from any to any 
  pass in  on $ext_if proto tcp from any to any port 25
  

Keep in mind that if your macros expand to lists of either ports or IP addresses, the macro expansion will create several rules to cover your definitions in the ruleset that is eventually loaded.

Tables: Data structures that are specifically designed to store IP addresses and networks. Originally devised to be a more efficient way to store IP addresses than macros that contained IP addresses and expanded to several rules that needed to be evaluated separately. Rules can refer to tables so the rule will match any member of the table.

  table <badhosts> persist counters file "/home/peter/badhosts"
  # ...
  block from <badhosts>
      

Here the table is loaded from a file. You can also initialize a table in pf.conf itself, and you can even manipulate table contents from the command line without reloading the rules:

$ doas pfctl -t badhosts -T add 192.0.2.11 2001:db8::dead:beef:baad:f00d

In addition, several of the daemons in the OpenBSD base system such as spamd, bgpd and dhcpd can be set up to interact with your PF rules.

Rules: The rules with the verbs, criteria and actions that determine how your system handles network traffic.

A very simple and reasonable baseline is one that blocks all incoming traffic but allows all traffic initiated on the local system:

  block
  pass from (self)
      

The pass rule lets our traffic pass to elsewhere, and since PF is a stateful firewall by default, return traffic for the connections the local system sends out will be allowed back.

You probably noticed the configuration here references something called (self).

The string self is a default macro which expands to all configured local interfaces on the host. Here, self is set inside parentheses () which indicates that one or more of the interfaces in self may have dynamically allocated addresses and that PF will detect any changes in the configured interface IP addresses.

This exact ruleset expanded to this on my laptop in my home network at one point:

 $ doas pfctl -vnf /etc/pf.conf
   block drop all
   pass inet6 from ::1 to any flags S/SA
   pass on lo0 inet6 from fe80::1 to any flags S/SA
   pass on iwm0 inet6 from fe80::a2a8:cdff:fe63:abb9 to any flags S/SA
   pass inet6 from 2001:470:28:658:a2a8:cdff:fe63:abb9 to any flags S/SA
   pass inet6 from 2001:470:28:658:8c43:4c81:e110:9d83 to any flags S/SA
   pass inet from 127.0.0.1 to any flags S/SA
   pass inet from 192.168.103.126 to any flags S/SA

The pfctl command here says to verbosely parse but do not load rules from the file /etc/pf.conf.

This shows what the loaded ruleset will be, after any macro expansions or optimizations.

For that exact reason, it is strongly recommended to review the output of pfctl -vnf on any configuration you write before loading it as your running configuration.

If you look closely at that command output, you will see both the inet and inet6 keywords. These designate IPv4 and IPv6 addresses respectively. PF since the earliest days has supported both, and if you do not specify which address family your rule applies to, it will apply to both.

But this has all been on a boring single host configuration. In my experience, the more interesting settings for PF use is when the configuration is for a host that handles traffic for other hosts, as a gateway or other intermediate host.

To forward traffic to and from other hosts, you need to enable forwarding. You can do that from the command line:

 # sysctl net.inet.ip.forwarding=1 
 # sysctl net.inet6.ip6.forwarding=1
	

But you will want to make the change permanent by putting the following lines in your /etc/sysctl.conf so the change survives reboots.

  net.inet.ip.forwarding=1 
  net.inet6.ip6.forwarding=1
	

With these settings in place, a configuration (/etc/pf.conf) like this might make sense if your system has two network interfaces that are both of the bge kind:

  ext_if=bge0
  int_if=bge1
  client_out = "{ ftp-data ftp ssh domain pop3, imaps nntp https }"
  udp_services = "{ domain ntp }"
  icmp_types = "echoreq unreach"
  match out on egress inet nat-to ($ext_if)
  block
  pass inet proto icmp all icmp-type $icmp_types keep state
  pass quick proto { tcp, udp } to port $udp_services keep state
  pass proto tcp from $int_if:network to port $client_out
  pass proto tcp to self port ssh
	

Your network likely differs in one or more ways from this example. See the references at the end for a more thorough treatment of all these options.

And once again, please do use the readability features of the PF syntax to keep you sane and safe.

A Configuration That Learns From Network Traffic Seen and Adapts To Conditions

With PF, you can create a network that learns. Fairly early in PF's history it occured to the developers that the network stack collects and keeps track of information about the traffic it sees, which could then be acted upon if the software became able to actively monitor the data and act on specified changes. So the state tracking options entered the pf.conf repertoire in their initial form with the OpenBSD 3.7 release.

A common use case is when you run an SSH service or really any kind of listening service with the option to log in, you will see some number of failed authentication attempts that generate noise in the logs. The password guessing, or as some of us say, password groping, can turn to be pretty annoying even if the miscreants do not actually manage to compromise any of your systems. So to eliminate noise in our logs we turn to the data that is anyway available in the state table, to track the state of active connections, and to act on limits you define such as number of connections from a single host over a set number of seconds.

The action could be to add the source IP that tripped the limit to a table. Additional rules could then subject the members of that table to special treatment. Since that time, my internet-facing rule sets have tended to include variations on

  table <bruteforce> persist
  block quick from <bruteforce>
  pass inet proto tcp from any to $localnet port $tcp_services \
        flags S/SA keep state \
	(max-src-conn 100, max-src-conn-rate 15/5, \
         overload <bruteforce> flush global)
	

which means that any host that tries more than 100 simultaneous connections or more than 15 new connections over 5 seconds are added to the table and blocked, with any existing connections terminated.

It is a good practice to let table entries in such setups expire eventually. How long entries stay is entirely up to you.

At first I set expiry at 24 hours, but with password gropers like those caught by this rule being what they are, I switched a few years ago to at four weeks at first, then upped again a few months later to six weeks. Groperbots tend to stay broken for that long. And since they target any service you may be running, state tracking options with overload tables can be useful in a lot of non-SSH contexts as well.

A point that observers often miss is that with this configuration, you have a firewall that learns from the traffic it sees and adapts to network conditions.

It is also worth noting that state tracking actions can be applied to all TCP traffic and that they can be useful for essentially all services.

The buzzwordability potential in the learning configurations is enormous, and I for one fail to see how the big names have failed to copy or imitate this feature and greytrapping which we will look at later, and capitalize on products with those features.

The article Forcing the password gropers through a smaller hole with OpenBSD's PF queues has a few suggestions on how to handle noise sources with various other services. More on queues in a few moments.

The Adaptive Firewall and the Greytrapping Game

At the risk of showing my age, I must admit that I have more or less always run a mail service. Once TCP/IP networking became available in some form for even small businesses and individuals during the early 1990s, once you were connected, it was simply one of those things you would do. Setting up an SMTP service (initially wrestling with sendmail and it legendary sendmail.cf configuration file) with accompanying pop3 and/or imap service was the done thing.

Over time the choice of mail server software changed, we introduced content filtering to beat the rise of the trashy, scanny spam mail and, since the majority of clients ran that operating system mail-borne malware. But even with state of the art content filtering some unwanted messages would make it into users' inboxes often enough to be annoying.

So when OpenBSD 3.3 shipped with the initial version of spamd it was quite a relief for people of my job category, even if that only would load lists of known bad senders' IP addresses and stutter at them one byte per second until the other side gave up.

Later versions introduced greylisting — answering SMTP connections from previously unknown senders with a temporary local error code and only accepting delivery if the same host tried again — which reduced the load on the content filtering machines significantly, and the real fun started with the introduction of greytrapping in the version of spamd(8) that shipped with OpenBSD 3.7.

Greytrapping is yet another adaptive or learning feature. The system identifies bad actors by comparing the destination email address in incoming SMTP traffic from unknown or already greylisted hosts with a list of known invalid addresses in the domains the site serves. The spamdb(8) command was extended to add features to add addresses to and delete from the spamtrap list.

Greytrapping was an extremely welcome new feature, and I adopted it eagerly. Soon after the feature became available, I set up for greytrapping. The spamtrap addresses were the ones initially addresses I fished out of my mail server logs — from entries produced by bounce messages that themselves turned out to be undeliverable at our end since the recipient did not exist — and after a few weeks I started publishing both the list of spamtraps and an hourly dump of currently trapped IP addresses.

The setup is amazingly easy. On a typical gateway in front of a mail server you instrument your /etc/pf.conf with a few lines, usually at the top,

  table <spamd-white> persist
  table <nospamd> persist file "/etc/mail/nospamd"
  pass in on egress proto tcp to any port smtp \
        divert-to 127.0.0.1 port spamd
  pass in on egress proto tcp from <nospamd> to any port smtp
  pass in log on egress proto tcp from <spamd-white> to any port smtp
  pass out log on egress proto tcp to any port smtp
    

Here we even suck in a file that contains the IP addresses of hosts that should not be subjected to the spamd treatment.

In addition you will need to set up with the correct options for spamd(8) and spamdlogd(8) in your /etc/rc.conf.local:

  spamd_flags="-v -G 2:8:864 -n "mailwalla 17.25" -c 1200 -C /etc/mail/fullchain.pem -K /etc/mail/privkey.pem -w 1 -y em1 -Y em1 -Y 158.36.191.225"
  spamdlogd_flags="-i em1 -Y 158.36.191.225"
      

The IP address here designates a sync partner, check out the spamd(8) man page for the other options. If you're interested, you can get the gory details of running a setup with several mail exchangers in the In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe article.

You probably do not need to edit the configuration file /etc/mail/spamd.conf much, but do look up the man page and possibly references to the bsdly.net blocklist. Finally, reload your PF configuration, start the daemons spamd(8) and spamdlogd(8) using rcctl, set up a crontab(5) line to run spamd-setup(8) at reasonable intervals to fetch updated blocklists.

The number of trapped addresses in the hourly dump has been anything from a few hundred in the earliest days, later in the thousands and even at times in the hundreds of thousands. For the last couple of years the number has generally been in the mid to low four digits, with each host typically hanging around longer to try delivery to an ever expanding number of invalid addresses in their database.

Just a few weeks ago, the list of “imaginary friends” rolled past 300,000 entries. The article The Things Spammers Believe - A Tale of 300,000 Imaginary Friends tells the story with copious links to earlier articles and other resources, while Maintaining A Publicly Available Blacklist - Mechanisms And Principles details the work involved in maintaining a blocklist that is offered to the public.

It's been good fun, with a liberal helping of bizarre as the number of spamtraps grew, sometimes with truly weird contents.

Traffic Shaping You Can Actually Understand

You've heard it before: Traffic shaping is hard. Hard to do and hard to understand.

Traditionally traffic shaping was available on all BSDs in the form of ALTQ, a codebase that its developers labeled experimental and contained implementations of several different traffic shaping algorithms. One central problem was that the configuration syntax was inelegant at best, even after the system was merged into the PF configuration.

In OpenBSD, which runs development on a strict six month release cycle, the code that would eventually replace ALTQ was introduced gradually over several releases.

The first feature to be introduced was always-on, settable priorities with the keyword prio.

A random example shows that this configuration prioritises ssh traffic above most others (the default is 3):

pass proto tcp to port ssh set prio 6

While this configuration makes an attempt at speeding up TCP traffic by assigning a higher priority to lowdelay packets, typically ACKs:

  match out on $ext_if proto tcp from $ext_if set prio (3, 7)
  match in  on $ext_if proto tcp to $ext_if set prio (3, 7)
	

Next up, the newqueue code did away with the multiple algorithms approach and settled on the Hierarchical fair-service curve (HFSC) as the most flexible option that would even make it possible to emulate or imitate the alternative shaping algorithms from the ALTQ experiment.

HFSC queues are defined on an interface with a hierarchy of child queues, where only the “leaf” queues can be assigned traffic. We take a look at a static allocation first:

  queue main on $ext_if bandwidth 20M
    queue defq parent main bandwidth 3600K default
    queue ftp parent main bandwidth 2000K
    queue udp parent main bandwidth 6000K
    queue web parent main bandwidth 4000K
    queue ssh parent main bandwidth 4000K
      queue ssh_interactive parent ssh bandwidth 800K
      queue ssh_bulk parent ssh bandwidth 3200K
    queue icmp parent main bandwidth 400K
  

You then tie in the queue assignment, here with match rules

  match log quick on $ext_if proto tcp to port ssh \
        queue (ssh_bulk, ssh_interactive)
  match in quick on $ext_if proto tcp to port ftp queue ftp
  match in quick on $ext_if proto tcp to port www queue http
  match out on $ext_if proto udp queue udp
  match out on $ext_if proto icmp queue icmp
  

which is definitely the way to add queueing to an existing configuration, and in my view also a good practice for configuration structure reasons. But you can also tack on queue this_or_that_queue at the end of pass rules.

There are two often forgotten facts about HFSC traffic shaping I would like to mention:

Traffic shaping is more often than not a matter of prioritizing which traffic you drop packets for, and no shaping at all takes place before the traffic volume approaches one or more of the limits set by the queue definitions.

One of the beautiful things about modern HFSC queueing is that you can build in flexibility, like this:

  queue rootq on $ext_if bandwidth 20M
    queue main parent rootq bandwidth 20479K min 1M max 20479K qlimit 100
    queue qdef parent main bandwidth 9600K min 6000K max 18M default
    queue qweb parent main bandwidth 9600K min 6000K max 18M
    queue qpri parent main bandwidth 700K min 100K max 1200K
    queue qdns parent main bandwidth 200K min 12K burst 600K for 3000ms
    queue spamd parent rootq bandwidth 1K min 0K max 1K qlimit 300
  
The min and max values are core to that flexibility. Subordinate queues can 'borrow' bandwidth up to their own max values within the allocation of the parent queue. The combined max queue bandwidth can exceed the root queue's bandwith and still be valid. However the allocation will always top out at the allocated or the actual physical limits of the interface the queue is configured on.

For bursty services such as DNS in our example you can allow burst for a specified time where the allocation can exceed the queue's max value, still within the limits set on the parent queue.

Finally, the qlimit sets the size of the queue's holding buffer. A larger buffer may lead to delays since it packets may be kept longer in the buffer before sending on their way out to the world.

And if you noticed the name of that final, tiny queue, you probably have guessed correctly what it was for. The traffic from hosts that were caught in the spamd net was really horrible, as this systat queues display shows:

 1 users Load 2.56 2.27 2.28                                      skapet.bsdly.net 20:55:50
 QUEUE                BW SCH  PRI    PKTS   BYTES   DROP_P   DROP_B QLEN BOR SUS  P/S   B/S
 rootq on bge0       20M                0       0        0        0    0            0     0
  main               20M                0       0        0        0    0            0     0
   qdef               9M          6416363   2338M      136    15371    0          462 30733
   qweb               9M           431590 144565K        0        0    0          0.6   480
   qpri               2M          2854556 181684K        5      390    0           79  5243
   qdns             100K           802874  68379K        0        0    0          0.6    52
  spamd               1K           596022  36021K  1177533 72871514  299            2   136
	    

It was good, clean fun. And that display did give me a feeling of Mission accomplished.

There are several other tools in the PF toolset such as carp(4) based redundancy for highly available service, relayd(8) for load balancing, application delivery and general network trickery, PF logs and the fact that tcpdump(8) is your friend, and several others that I have enjoyed using but I decided to skip since this was supposed to be a user group talk and a somewhat dense article.

I would encourage you to explore those topics further via the literature listed under the Resources heading for more on these.

Who Else Uses PF Today?

PF originated in OpenBSD, but word of the new subsystem reached other projects quickly and there was considerable interest from the very start.  Over the years, PF has been ported from the original OpenBSD to the other BSDs and a few other systems, including

Other than Oracle with their port to Solaris, most ports of the PF subsystem happened before the OpenBSD 4.7 NAT rewrite, and for that reason they have kept the previous syntax intact.

There may very well be others. There is no duty to actually advertise the fact that you have incorporated BSD licensed code in your product.

If you find other products using PF or other OpenBSD code in the wild, I am interested in hearing from you about it. Please comment or send email to nix at nxdomain dot no.

Resources for Further Exploration

The PF User's Guide

The Book of PF by Peter N. M. Hansteen

Absolute OpenBSD by Michael Lucas

Network Management with the OpenBSD Packet Filter toolset, by Peter N. M. Hansteen, Massimiliano Stucchi and Tom Smyth (A PF tutorial, this is the BSDCan 2024 edition). An earlier, even more extensive set of slides can be found in the 2016-vintage PF tutorial.

That Grumpy BSD Guy Blog posts by Peter N. M. Hansteen

OpenBSD Journal News items about OpenBSD, generally short with references to material elsewhere.

Wednesday, September 14, 2022

Open Source in Enterprise Environments - Where Are We Now and What Is Our Way Forward?

We have been used to hearing that free and open source software and enterprise environments in Big Business are fundamentally opposed and do not mix well. Is that actually the case, or should we rather explore how business and free software can both benefit going forward?

Puffy, the OpenBSD mascot, shiny version

Free and Open Source vs Enterprise and Business: The Bad Old Days

Open source, free software and enterprise IT environments have both been around for quite a while. I'm old enough to remember when the general perception was that the free exchange of source code was merely a game for amateurs, or at best an academic excercise. In contrast, the proper business way of doing things was to perhaps learn general principles and ideas from the academics, but real products for business use would be built to be sold as binary only, with any source code to be kept locked away and secret.

Note: This piece is also available without trackers but more basic formatting here.

If you're a little younger you may remember a time when Windows NT is the future was essentially gospel and all the business pundits were saying we would be seeing the last of Unix and mainframes both within only a handful of years.

Thinking back to the late 1980s and early 1990s it is hard to imagine now how clear the consensus seemed to be on the issue at that point. The PC architecture and a few other proprietary technologies was the way of business and the way forward.

No discussion or dissent seemed possible.

Then, The Internet Happened

Then the Internet happened. What few people outside some inner circles were aware that what actually made the Net work was code that came directly out of the Berkeley Software Distribution. BSD Unix, or simply BSD for short, was a freely licensed operating system that was the result of a rather informal cooperation of researchers in academia and business alike, originally derived from Unix source code.

When the United States Department of Defense wanted work done on resilient, device independent, distributed and autoconfiguring networks, the task of supplying the reference implementation for the TCP/IP stack, based on a stream of specifications dubbed Request for comments or RFCs, fell to the international group of developers coordinated by the Computer Science Research Group at the University of California's Berkeley campus. In short, the Internet came from BSD, which thanks to a decision made by the Regents of the University of California, was freely licensed.

The BSD sourced TCP/IP stack was part of all Internet capable systems until around the turn of the century, when Linux developers and later Microsoft started working on their own independent implementations. By that time it had been forcefully demonstrated to the developer community at least that open source code was indeed capable of scaling to industrial scale and beyond.

Due to a handful of accidents of history, mainly involving imperfect communications between groups of developers and combined with a somewhat misguided lawsuit involving the BSD code, it was Linux that became the general household term for free software in general and the re-emergence of Unix-like systems in the Internet connected server market space. Linux distributions came with a largely GNU userland as well as generous helpings of BSD code.

At roughly the same time Linux emerged, the BSD code became generally available via the FreeBSD and NetBSD projects, and soon after the OpenBSD project, which forked from the NetBSD code base in the mid 1990s. For a more detailed history of these developments, see the three part series on the APNIC blog starting with this piece. If that piqued your interest, you may enjoy this piece about some incremental improvements over time in OpenBSD.

The War on Linux and the Proliferation of Open Source Tools

During the 1990s and early 2000s the Internet and services of all kinds that ran on top of it expanded in all directions. That expansion had the effect of advancing the free unixlike systems such as Linux and the BSDs, which would run quite comfortably on commonly available hardware, along with an ever expanding number of development tools and software of all kinds to new categories of users.

The success of the open source software lead to what would be dubbed The War on Linux, a rather vicious defamation campaign executed in both PR campaigns and lawsuits, and driven mainly by the then-dominant desktop software vendor's ambition to dominate server space as well. One of the more bizarre sequences of Linux-targeting lawsuits was run by proxy, and is extensively documented at groklaw.net (Note: http-only site). It is worth noting that the process eventually lead to bankruptcy for the litigant.

Over the years it became clear to essentially everyone in the industry that open source tools were essential to development, and several practical aspects of developer life lead to ever increasing open source use. During the time of The War on Linux, the likes of Apple, Cisco, Netscaler (later acquired by Citrix) and Sun Microsystems (later acquired by Oracle) either incorporated open source code in their products and workflows, open sourced large parts of their own code or forked freely available code to base proprietary systems on. It may be worth discussing each of these approaches in detail later.

On to the Present: We All Use...

Fast forward to the present day, and I recently had colleagues sum up that in the enterprise environments we move in,

Software is developed on Macs,
deployed on a cloud somewhere,
which more likely than not runs on Linux.

And the software itself is likely built with open source tools and pulls in dependencies from open source projects, possibly hosted on Github or other public sites.

Your software in all probability uses some open source. And even if you are not a developer, you most likely use open source tools that are integrated in your operating system or common application software or web services.

On the client side of things, an ever increasing part of the volume comes from smartphones, tablets and the like, where the market share for open source based systems (Android and IOS) exceeds 90 percent. In a document we will come back to later, the Norwegian National Security Authority (NSM) estimates that approximately 90 to 98 per cent of all software in use to some extent has dependencies on open source software. Other relevant statistics can be found here, here and here. Or, if you're in a bit of a hurry: It is estimated that some 3.1 billion Linux-based Android phones are currently in use. In addtion, there is Apple, which we know has a significant amount of BSD code in their software.

It is of course worth noting that by now even the old open source arch-enemy Microsoft ships their offerings with what amounts to an almost complete Linux distribution as a subsystem. The same company regularly lobs cash over the wall to the likes of The OpenBSD Foundation and regularly contributes to other open source projects. Not to mention that much of what runs in their Azure cloud is one way or the other Linux based.

Security: QA Your Supply Chain, Excercise the Right to Repair

Back in the days of The War on Linux, and to some extent still, we have often been faced with claims that open source software could either never be as secure as proprietary software or that open source software was inherently more secure than the closed source kind, because "given enough eyes, all bugs are shallow".

Both assertions fail because even without access to source code, it is possible to probe running software for vulnerabilities, and on the other hand the shallowness of bugs depends critically on the eyes looking being attached to people with sufficient competence in the field.

The public reaction to a couple of security incidents during recent years that generated a flurry of largely uninformed punditry are worth revisiting for the lessons that can be learned.

The Solarwinds supply chain incident aka SUNBURST (2020) - One of the most widely publicized yet mostly quite poorly understood security incidents in recent years emerged when it was revealed that adversaries unknown had been able to compromise the build computers where the binaries for their widely used network management software was built for distribution.

The SANS institute has produced a fairly thorough writeup of the incident, which breaks down as follows: The first stage of a multi-stage compromise kit was included in binary distribution packages, complete with authentic signatures from the build system, that were largely put directly into production environments by network admins everywhere. The malware then went on to explore the networks they landed in, and through a process that made heavy use of crafted DNS queries and other non-obvious techniques, the miscreants were able to compromise several high security government and enterprise networks.

Several open source component supply chain incidents (2020 onwards) - Soon after the SUNBURST incident several incidents occured where popular open source components that other systems pulled in as dependencies started malfunctioning or were suddenly unavailable, causing complete malfunctions or loss of functionality such as a web service suddenly refusing to interact with specific networks.

The sudden breakage in open source components caused quite a bit of uproar, and predictably the chattering subset of the consulting class set about churning out dire warnings about the risk of using open source of any kind.

Watching from the sidelines it struck many open source oriented professionals, myself included, that the combination of these incidents carry an important lesson. It is obvious in a modern environment we suck in upgrades automatically and frequently, and that no untested code should ever be deployed directly to production.

Blind trust versus the right to read (and educate yourself) and the right to repair - In the case of proprietary, binary-only software, you have no choice but to trust your supplier and that they will address any defects in a timely manner. The upshot is that with proprietary, binary-only you do not have access to two important features of open source software: The right to read and study the code, and the right to repair any defects you find, potentially saving yourself potential service shutdowns or workarounds while the secret parts of your system get fixed elsewhere.

The lesson to be learned is that you need to run quality assurance on your supply chain. You may choose to trust, but you still need to verify. That goes for open source and proprietary software both.

This Norwegian felt slightly elated when reading that the Norwegian National Security Authority (NSM) provides essentially the same assessments in their published recommendations.

Contributing - Cooperating on Maintenance

As with any product it is entirely possible to be a relatively passive consumer, just install and use, and build whatever you need on top, interacting with the community only via downloading as needed from the mirror sites. Communicating via online forums, mailing lists or other channels is entirely optional.

If you are a developer or integrator with an ambition to make one or more opern source products central to your business either by using and contributing to an existing project or starting a new one, several approaches are possible.

Let's take a look at the strategies some big names adopted on open source in their products:

Grab and fork, sell hardware: The Netscaler load balancer and application delivery products were based on a fork of FreeBSD.

They appear to have rewritten large parts of the network stack and devised a multifunctional network product on top, which among other things features a slick web GUI for most if not all admin tasks.

If you look closely, Netscaler (since acquired and rebranded by Citrix) appear to cultivate a menagerie of open source projects to interface with their products.

However they appear not to have in particularly close contact with their main upstream. (It is worth noting that the BSD license does not require publishing changes to the code base.) When dropping to a shell on a Netscaler unit, last time I looked the output of uname -a seemed to indicate that their kernel was still based on FreeBSD 8.4, which the FreeBSD web site lists as End of Life by August 1, 2015.

Grab and fork, sell hardware, keep sync with your upstream: Starting with the initial release of macOS, Apple have maintained the software that drives their various devices, from phones to desktop computers and related services with generous helpings of open source code, along with what appears to be a general willingness to publish code and interact with upstream projects such as the FreeBSD project. Apple maintains the Open Source at Apple site for easy access to the open source components of their offerings.

This mode of open source interaction seems to be rather common, especially among network oriented suppliers of various specialty gear.

Open source everyting, sell support: Despite early scepticism from business circles, several companies have built successful companies on the model of participating or even driving the development of open sources systems or components, making support contracts (which may include early or privileged access to updates) as well as consulting services the main or sole source of company revenue.

Decide what code is both good enough to publish and useful elsewhere: Finally, for those of us in the services or consulting business who will occasionally write code that is not necessarily business specfic, the reasonable middle ground is just that. Identify code that meets the following criteria:

  1. Was developed by yourself and cleared by your organization and other stakeholders such as your customer as such
  2. Is high enough quality that you dare show it to others
  3. Does not reveal core aspects of your clients' business
  4. Is likely to be useful elsewhere too
  5. Would be nice to have exposed to other sets of eyes in order do identify bugs and fix them

If you have code under your care in your organization that meets those criteria, you should in my opinion be seriously considering making that code open source.

Your next adventure will then be to pick an appropriate license.

Now for Policies and Processes - Do You Have Them?

If you have followed on this far, you probably caught on to the notion that it is wise to set up clear policies and procedures for handling code, open source or otherwise.

Keep in mind that

A license is an assertion of authority. A license is a creator's message to the world that states the conditions others must abide by when using, or if they allow it, change and further develop the code.

Without a license the default regime is that only the person or persons who originated the code have the right to make changes or for that matter make further copies for redistribution.

For that reason it is important to ensure that every element of your project has a known copyright and license.

There have been quite a few instances of free software project rewriting functionally equivalent, or hopefully better, versions of whole subsystems because of unacceptable or unclear licenses (see the OpenBSD articles in the Resources section for some examples).

Procedures and policies, you need them. A self employed developer working on their own project is usually free to choose whatever license they please. In a corporate environment, any code developed is likely tied to a contract of some sort, which may or may not set the parameters of who holds the copyright or what licenses my be acceptable. The exact parameters of what can be decided by contract and what follows from copyright law my vary according to what jurisdiction you are in. When considering whether to publish your own code under an open source license, make sure all stakeholders (and certainly any parties to any relevant contract) agree on the policies and procedures.

Keep it simple, for your own sake. There are supposedly several hundred licenses in existence that the Open Source Initiative considers to be open source. In the interest of making life easier for anyone who would be interested in working on your code, please consider adopting one of those well-known licenses.

They range from the simplest, BSD or MIT style ones that run a handful of sentences and can be condensed to you can do whatever you like with this material except to claim that you made it all yourself to elaborate documents (the GNU GPL v3 comes to mind) which set out detailed terms and conditions, may require republication of any changes under the same terms, and could set up a specific regime with respect to patent disputes.

It is also important to consider that components you use in your project may have specific license requirements and that different licenses may contain terms that make the licenses incompatible in practice.

My general advice here is, make it as simple as possible, but no simpler.

Or to rephrase slightly: The general advice for dealing with licenses echoes that of dealing with crypto code: Do not set out writing your own unless you know exactly what you are doing. Avoid that path if at all possible.

When in need, call in Legal (but make sure they understand the issues). Lawyers endure a lengthy education in order to pass the bar and turn to practicing law, but there is no guarantee that a person well versed in other business legalese has any competence at all when it comes to matters of copyright law. When you do turn to Legal for help, be very exacting and stern in insisting that they demonstrate a command of copyright basics and if at all possible have a reasonable real world understanding of how software is built.

As in, you really do not want to spend an entire afternoon or more explaning the difference between static and dynamic linking and why this matters in the face of a certain license, or that specific terms of different licenses deemed open source by the Open Source Initiative may in fact be incompatible in practice.

It is important to keep in mind that doing open source is about making our lives more productive and enjoyable by exchanging ideas between quality professionals, perhaps sharing the load of maintenance and leaving us all more resources to develop our competence and products further.

The Way Forward - The Work Goes On

So this is where we are today. Modern software development and indeed a goodly chunk of business and society in general depends critically on open source software.

If you enjoyed this piece (or became annoyed by any part of it) I would like to hear from you. I especially welcome comments from colleagues who have experience with open source use and/or development in enterprise settings. Of course if you are just curious about open source software in these settings, you are welcome to drop me a line too. I am most easily reachable via email nix at nxdomain dot no.


I want to extend thanks to Malin Bruland and Knut Yrvin for excellent comments and proofreading.

Resources

All things open source (including an almost encyclopedic collection of licenses) at The Open Source Initiative

Wikipedia: Berkeley Software Distribution about where the Internet came from

The GNU Operating System, supported by The Free Software Foundation

The FreeBSD operating system project

Open Source at Apple

Peter Hansteen: What every IT person needs to know about OpenBSD Part 1: How it all started,
What every IT person needs to know about OpenBSD Part 2: Why use OpenBSD?,
What every IT person needs to know about OpenBSD Part 3: That packet filter
(or the whole shebang in the raw at bsdly.blogspot.com)


Bradford Morgan White: The Berkeley Software Distribution

Nasjonal Sikkerhetsmyndighet (NSM): Åpen kildekode i den digitale leverandørkjeden (Norwegian only)

Business of Apps: Android Statistics (2023)

Bank My Cell: How Many Android Users Are There? Global and US Statistics (2023) (Source: https://www.bankmycell.com/blog/how-many-android-users-are-there)

Statista: Market share held by Apple iOS operating system of smartphone shipments from 1st quarter 2011 to 4th quarter 2022

Appendix: License Complexity Measured by Word Count

While presenting on free and open source software in enterprise environments, the topic of license complexity and how to handle licensing matters usually generates questions of the type,

"Does doing open source mean we need to staff an Open Source Program Office?

Does this not add a considerable measure of complexity to the development organization?

Do the open source licenses mean we have to hire even more lawyers?"

So I set out to do a little research. I figured that the number of words in a text is a useful, if not perfect indicator of complexity, so we could use that measure as a useful and easy to obtain proxy for measuring how complex the licenses we are likely to encounter are in practice.

I headed over to the Open Source Initiative website and their excellent collection of open source licenses. I then picked out the more common open source licenses, and for each license I pasted the text into the word counter at wordcounter.net, which in addition to the word count provides an indication of likely target audience "reading level" and estimated reading time as well as a few other measures of the text characteristics.

The results are in the following table:


License complexity by wordcount
Word count Reading
Level
Reading
time
1-clause BSD License 160 College Graduate 35s
2-clause BSD License 191 College Graduate 42s
3-clause BSD License 220 College Graduate 48s
GNU GPL v2.0 2964 College Graduate 10m47s
GNU GPL v3.0 5608 College Graduate 20m30s
Apache License v2.0 1677 College Graduate 5m44s
Microsoft 365 Developer program license 4803 College Graduate 17m28s
Microsoft Windows 11 OS license terms 5766 College Graduate 20m58
Oracle End User License Agreement 2554 College Graduate 9m17s
Adobe End-User License Agreement 450 College Graduate 1m38s
Apple Licensed Application End User License Agreement 1524 College Graduate 5m32s

Once again, strict word count is not a perfect indicator of complexity — other measures such as sentence length and logical structure and interdependencies are likely to matter in real life scenarios.

Sunday, September 11, 2022

Your 'Forgotten' Blockchain Account Needing Reactivation? It's a Scam

 Scammers are using 'forgotten' cryptocurrency accounts as bait for stealing the identities of the gullible and dishonest. You have been warned.

I have never put any money into bitcoin or other cryptocurrencies.

Note: This piece is also available without trackers but only 'classic' formatting here

For a few years a long time ago I worked for economists at The Norwegian School of Economics, and even though I never formally studied the subject, enough of the general concepts rubbed off on me that when the early rumblings of Bitcoin started making the rounds, I quickly identified rather basic failings in the proponents' logic and turned my attention elsewhere.

My conclusion was that either these people were innocently unaware of the basics of money supply and its relationship with market pricing mechanisms, or they were not very innocently trying to part the more innocent with their money.

As history shows, any asset with no or very little intrinsic or objective value and market price set solely by demand is bound to be an unstable asset and a high risk investment prone to bubbles such as the famous tulip mania.

Looking at the development of Bitcoin over time, illustrated with a chart from Wikipedia:

it becomes immediately clear that Bitcoin is not a stable asset. Remember the tulips.

Be that as it may, I was only slightly annoyed by a phone call on Friday from what appeared to be a United Kingdom (+44) number. The lady at the other end had what I took to be an Indian accent and did not speak very clearly, but it eventually became clear that she was offering to help me regain access to my 'blockchain account'.

As we already know, no blockchain account of mine has ever existed, and I told the lady so. I also helpfully offered her advice that she would most likely not be paid for her efforts and would be better off just putting down the phone and getting on with life in the great elsewhere.

She mostly ignored what I was saying and stuck to her script. In the end I told her to send whatever she had to offer by email, and she confirmed she had that address to hand.

I have never tried to keep either my email address or my phone number secret. In fact, for several separate periods of time I have been making an effort to be accessible to both existing and new customers. So it was no surprise that some random person on the Internet would have both of these items on file.

What did surprise me a bit was that the next day, an email message did turn up about the 'blockchain account'.

Here is the message in two screenshots:

blockchain message screenshot 1
blockchain message screenshot 2
while the message itself is preserved here with headers (and fugly markup) intact.

The full body text of the message reads,

Hello Mr /Mrs. Peter,

We hope this email finds you well! This is Maya Miller from the Support - Team Department of Blockchain. Further to our recent communication, please check below the specific details as discussed. During this Informative Campaign we are contacting every client that has an account in our system from of November 2012 until March 2021.The system of Blockchain has blocked these accounts temporarily for security reasons. A Bitcoin transaction is identified by it's unique form of Transaction ID, which consists of a 64-digit alphanumeric string, that does not contain any kind of specification.

The reasons that have caused the registration of your name in Blockchain are listed below:

1- The account may have been created from your Previous Online Trading experiences or any other Online form of Investment and the funds have been collected in a Recovery Wallet.

2- In 2011-2016 Bitcoin and Blockchain were everywhere (Commercials, Platforms and all over the Internet) and 30% of our clients registered without even knowing the results. Different platforms and several Online Games used to offer free Bitcoins as a Promotion / Gift which made a great publicity to this Crypto Currency.

Your total available balance in is 3.402 BTC and converted to standard GBP it goes up to 60,361.58 GBP. The amount in Standard is variable, based on the Bitcoin price movement. You can find a screenshot of the status of your account down below, provided directly from our Security Department of Blockchain. NOTE!

This confidential information is preferred not to be shared externally, as a measure taken against any possible fraud activity! At the moment that an account is created in Blockchain, the client must accept the terms and conditions in order to use its services and features. According to the Terms of Use, when an account owner in Blockchain has 3+ years of not showing any activity (including Logging In) Blockchain proceeds as follows:

1) If the account has no balance, it will get terminated immediately.

2) If the account has a balance inside, it will go to "Dormant Status", until the owner is informed. Further to the procedures listed above, the Support Department came up with 3 (three) suggestions:

a) Deactivate the account, and these funds will not be your responsibility anymore.
These funds will go back to the company as a property.
b) Reactivate your account, and withdraw the funds with the guidance of one of our Refund Agents.
c) Reactivate the account and keep it for online trading.
The reactivation procedure consists as the following: Since the owner of the account has not been active, he/she must show activity (movement) inside the account. That would be done by placing a minimum deposit of 1% (0.034 BTC) of the total funds that the client owns.
In order to reactivate the account you must provide the following documents:
1- One official document (Passport, Driving License of National ID)
2- Utility Bill or Bank Statement, according to client preference After you provide the documents, and the system confirms that the order has been placed, your account will go to "Active Status".
In order for you to log in to your crypto tracker platform, you should get connected to one of our agents first.

If you have any questions don't hesitate to reply back to this email and also provide us your final decision. Thank you for being a member of our company!
Kind Regards,

Maya Miller Support team_ Blockchain Security Department
Support Department and Controller of Financial Policies of Blockchain.

© Blockchain 2022 I I- Luxembourg SA L-2340 Luxembourg, 1, Rue Philippe II- C/O Legalinx Limited Tallis House 2 Tallis Street Temple London, EC4Y OAB, UNITED KINGDOM

Disclaimer: Please note Blockchain.com does not provide investment advice or recommendations. The information made available to you is for your own general use and is not based on any evaluation of your personal circumstances.

The essence of this message is that as part of the 'account recovery' process, they want copies of your 'Passport, Driving License of National ID' (sic), as well as 'Utility Bill or Bank Statement'. While it is possible that some valid institutions would require one of these in their account recovery procedure, supplying several of these would in fact enable the party at the other end to impersonate me to a third party.

As this tweet shows,


I quickly concluded this was an attempt at ID theft.

The lesson to be learned here is that if you do not remember ever having traded in cryptocurrencies, you almost certainly do not have a recoverable account with bitcoins in it.

Given the size of the amount promised it would not surprise me that they do get some takers. But scamming the scammers takes a certain level of skill most dishonest and greedy people do not possess.

And I suppose it will not surprise anyone that the tweet linked here produced several replies from bots or bot-workalikes that tried to hawk their favorite cryptocurrency (can we call them #cryptotulips yet?) project. Those were the easiest block decisions I have ever faced.


Update 2022-09-13: Just as we were finishing lunch today at a pleasant Asian-style restaurant in Vienna, this happened:

[2022-09-13 12:47:33] Incoming call from +447894561236, correctly flagged by the Android phone app as Potential fraud
Me: Peter Hansteen speaking
Bitcon Lady: Hello Mister Peter I am calling from the support team of Blockchain
Me: Hi there. Please go to bsdly.blogspot.com and read about yourself. Bye then!
(hangs up)

I must admit I had kind of hoped it would end there, but

[2022-09-13 12:48:01] Incoming call from +447894561236, correctly flagged by the Android phone app as Potential fraud
Me: Peter Hansteen speaking
Bitcon Lady: Hello Mister Peter would you please repeat what you just said
Me: What I said was, please go to bsdly dot blogspot dot com and read about yourself. Bye then!

With this addendum the article is now a full disclosure post.

I would however advise whoever maintains the list of flagged numbers that this number should not be tagged as Potential fraud, but more accurately as Fraud.

I am sure somebody, somewhere will appreciate your saving them a handful of bytes of storage.

Wednesday, September 7, 2022

The Things Spammers Believe - A Tale of 300,000 Imaginary Friends

It finally happened. Today, I added the three hundred thousandth (yes, 300,000th) spamtrap address to my greytrapping setup, for the most part fished out of incoming traffic here, for spammers to consume.

A little more than fifteen years after I first published a note about the public spamtrap list for my greytrapping setup in a piece called Hey, spammer! Here's a list for you!, the total number of imaginary friends has now reached three hundred thousand. I suppose that is an anniversary of sorts.

Note: This piece is also available without trackers but classic formatting only here.

If this all sounds a bit unfamiliar, you can find the a brief explanation of the data collected and the list itself on the traplist home page.

And yes, the whole thing has always been a bit absurd.

That said, at the time in the mid noughties this greytrapping setup was announced, we had been battling scammy spam email and malicious software that also abused email to spread for some years, and we were eagerly looking for new ways to combat the spam problem which tended to eat into time and resources we would rather have used on other things entirely.

With that backdrop, collecting made up or generated, invalid email addresses in our home domains from various logs as traps for spammers seemed like an excellent joke and a fun way to strike back at the undesirables who did their damnedest to flood our users' mailboxes.

The initial annoncement shows the early enthusiasm, as does a followup later in the same month, Harvesting the noise while it's still fresh; SPF found potentially useful. With a small helping of scepticism towards some of the other methods and ideas that circulated at the time, of course.

The various followups (search on the site using "spam, "antispam" or for that matter "spamd" and you will find quite a few) reveal that we went to work on collecting, feeding to spamdb and publishing with a grin for quite a while.

I even gave a talk at BSDCan 2007 about the experience up to that point around the time the traplist became public.

A few years later I posted a slightly revised version of that somewhat overweight paper as a blog post called Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools that has also grown some addenda and updates over the years.

I have revisited the themes of spam and maintaining blocklists generated from the traffic that hits our site a few times over the years.

The most useful entries are probably Maintaining A Publicly Available Blacklist - Mechanisms And Principles (April 2013) and In The Name Of Sane Email: Setting Up OpenBSD's spamd(8) With Secondary MXes In Play - A Full Recipe (May 2012), while the summary articles Badness, Enumerated by Robots (August 2018) and Goodness, Enumerated by Robots. Or, Handling Those Who Do Not Play Well With Greylisting offer some more detail on the life that includes maintaining blocklists and pass lists.

However, by the time the largest influx of new spamtraps, or imaginary friends if you will, happened during February through April of 2019 I was fresh out of ideas on how to write something entertaining and witty about the episode.

What happened was that the collection that at the time had accumulated somewhat more than fifty thousand entries, at a rate of no more than a few tens of entries per day for years, started swelling by several thousand a day, harvesting again from the greylist.

The flood went on for weeks, and forced me to introduce a bit more automation in the collecting process. I also I tried repeatedly to write about the sudden influx, but failed to come up with an interesting angle and put off writing that article again and again.

As I later noted in that year's only blog entry The Year 2019 in Review: This Was, Once Again, Weirder Than the Last One, starting January 30th 2019

"I noticed via my scriptery that reports on such things that a large number of apparent bounce message deliveries to addresses made up of "Western-firstname.Chinese-lastname@mydomain.tld", such as aaron.pu@bsdly.net or abby.na@bsdly.net, had turned up, in addition to a few other varieties with no dot in the middle, possibly indicating separate sources."

The IP addresses of the sending hosts were all in Chinese address ranges, and some weeks later, in April, we had ended up harvesting at least 120 000 unique new entries of a very similar kind before the volume went down rather abruptly to roughly what it had been before the indicent.

It is likely that what we were seeing was backscatter from one or more phishing campaigns targeting Chinese users where for reasons only known to the senders they had chosen addresses in our domains as faked sender addresses.

Fortunately by the time this incident occurred I had started keeping a log of spamtraps by date added and the actual greylist dumps generated by the blocklist generating script can be retrieved so more detailed data can be assembled when and if someone can find the time to do so.

As I have kept repeating over the years, maintaining the spamtrap list and the blocklists sometimes turns up bizarre phenomena. Among the things that keep getting added to the spamtraps list are the products of SMTP callbacks, and another source of new variants seems to be simply shoddy data handling at the sender end. We keep seeing things that more likely than not are oddly truncated versions of existing spamtraps.

And finally, while the number of trapped hosts at any time seems to have stabilized over the last couple of years at the mid to low four digits, we seem to be seeing that low number of hosts aggressively targeting existing spamtraps, as detailed in the February 2020 sextortion article.

I have at times been astonished by what appears to be taken as useful addresses to send mail to, and I am sure the collecting and blocking activity will turn up further absurdities unheard of going forward. It is also quite possible that I have forgotten about or skipped over one or more weird episodes in the saga of the spamtraps and blocklists. I hope to be able to deliver, at odd intervals, writeups that are interesting, useful, funny -- at least one and hopefully all.

If you are interested in the issues I touch on here or if the data I accumulate would be useful in your research, please let me know via comments or email.

And yes, since I I know you have been dying to ask, this is the entry, collected in the evening (CEST) of 7 September 2022, which took our population of imaginary friends over the 300 000 line:


Sep 7 19:52:18 skapet sshd[31622]: Failed password for invalid user ftpshared from 157.230.151.241 port 45876 ssh2


which by the obvious processing we do here from failed login attempt to offcial spamtrap becomes


Date    	Source  Original     Spamtrap
2022-09-07	SSH	ftpshared    ftpshared@bsdly.net
  

and joins the collection as entry number 300,000 (three hunded thousand).

By the time you read this, the total is likely to have increased yet again.

On a relevant mailing list it was been suggested that if you run a large scale email service, our list of spamtraps could be useful in filtering outgoing mail. If a customer tries to contact one of our imaginary friends, you probably need to pay extra attention to that customer.