Wednesday, September 14, 2022

Open Source in Enterprise Environments - Where Are We Now and What Is Our Way Forward?

We have been used to hearing that free and open source software and enterprise environments in Big Business are fundamentally opposed and do not mix well. Is that actually the case, or should we rather explore how business and free software can both benefit going forward?

Puffy, the OpenBSD mascot, shiny version

Free and Open Source vs Enterprise and Business: The Bad Old Days

Open source, free software and enterprise IT environments have both been around for quite a while. I'm old enough to remember when the general perception was that the free exchange of source code was merely a game for amateurs, or at best an academic excercise. In contrast, the proper business way of doing things was to perhaps learn general principles and ideas from the academics, but real products for business use would be built to be sold as binary only, with any source code to be kept locked away and secret.

Note: This piece is also available without trackers but more basic formatting here.

If you're a little younger you may remember a time when Windows NT is the future was essentially gospel and all the business pundits were saying we would be seeing the last of Unix and mainframes both within only a handful of years.

Thinking back to the late 1980s and early 1990s it is hard to imagine now how clear the consensus seemed to be on the issue at that point. The PC architecture and a few other proprietary technologies was the way of business and the way forward.

No discussion or dissent seemed possible.

Then, The Internet Happened

Then the Internet happened. What few people outside some inner circles were aware that what actually made the Net work was code that came directly out of the Berkeley Software Distribution. BSD Unix, or simply BSD for short, was a freely licensed operating system that was the result of a rather informal cooperation of researchers in academia and business alike, originally derived from Unix source code.

When the United States Department of Defense wanted work done on resilient, device independent, distributed and autoconfiguring networks, the task of supplying the reference implementation for the TCP/IP stack, based on a stream of specifications dubbed Request for comments or RFCs, fell to the international group of developers coordinated by the Computer Science Research Group at the University of California's Berkeley campus. In short, the Internet came from BSD, which thanks to a decision made by the Regents of the University of California, was freely licensed.

The BSD sourced TCP/IP stack was part of all Internet capable systems until around the turn of the century, when Linux developers and later Microsoft started working on their own independent implementations. By that time it had been forcefully demonstrated to the developer community at least that open source code was indeed capable of scaling to industrial scale and beyond.

Due to a handful of accidents of history, mainly involving imperfect communications between groups of developers and combined with a somewhat misguided lawsuit involving the BSD code, it was Linux that became the general household term for free software in general and the re-emergence of Unix-like systems in the Internet connected server market space. Linux distributions came with a largely GNU userland as well as generous helpings of BSD code.

At roughly the same time Linux emerged, the BSD code became generally available via the FreeBSD and NetBSD projects, and soon after the OpenBSD project, which forked from the NetBSD code base in the mid 1990s. For a more detailed history of these developments, see the three part series on the APNIC blog starting with this piece. If that piqued your interest, you may enjoy this piece about some incremental improvements over time in OpenBSD.

The War on Linux and the Proliferation of Open Source Tools

During the 1990s and early 2000s the Internet and services of all kinds that ran on top of it expanded in all directions. That expansion had the effect of advancing the free unixlike systems such as Linux and the BSDs, which would run quite comfortably on commonly available hardware, along with an ever expanding number of development tools and software of all kinds to new categories of users.

The success of the open source software lead to what would be dubbed The War on Linux, a rather vicious defamation campaign executed in both PR campaigns and lawsuits, and driven mainly by the then-dominant desktop software vendor's ambition to dominate server space as well. One of the more bizarre sequences of Linux-targeting lawsuits was run by proxy, and is extensively documented at groklaw.net (Note: http-only site). It is worth noting that the process eventually lead to bankruptcy for the litigant.

Over the years it became clear to essentially everyone in the industry that open source tools were essential to development, and several practical aspects of developer life lead to ever increasing open source use. During the time of The War on Linux, the likes of Apple, Cisco, Netscaler (later acquired by Citrix) and Sun Microsystems (later acquired by Oracle) either incorporated open source code in their products and workflows, open sourced large parts of their own code or forked freely available code to base proprietary systems on. It may be worth discussing each of these approaches in detail later.

On to the Present: We All Use...

Fast forward to the present day, and I recently had colleagues sum up that in the enterprise environments we move in,

Software is developed on Macs,
deployed on a cloud somewhere,
which more likely than not runs on Linux.

And the software itself is likely built with open source tools and pulls in dependencies from open source projects, possibly hosted on Github or other public sites.

Your software in all probability uses some open source. And even if you are not a developer, you most likely use open source tools that are integrated in your operating system or common application software or web services.

On the client side of things, an ever increasing part of the volume comes from smartphones, tablets and the like, where the market share for open source based systems (Android and IOS) exceeds 90 percent. In a document we will come back to later, the Norwegian National Security Authority (NSM) estimates that approximately 90 to 98 per cent of all software in use to some extent has dependencies on open source software. Other relevant statistics can be found here, here and here. Or, if you're in a bit of a hurry: It is estimated that some 3.1 billion Linux-based Android phones are currently in use. In addtion, there is Apple, which we know has a significant amount of BSD code in their software.

It is of course worth noting that by now even the old open source arch-enemy Microsoft ships their offerings with what amounts to an almost complete Linux distribution as a subsystem. The same company regularly lobs cash over the wall to the likes of The OpenBSD Foundation and regularly contributes to other open source projects. Not to mention that much of what runs in their Azure cloud is one way or the other Linux based.

Security: QA Your Supply Chain, Excercise the Right to Repair

Back in the days of The War on Linux, and to some extent still, we have often been faced with claims that open source software could either never be as secure as proprietary software or that open source software was inherently more secure than the closed source kind, because "given enough eyes, all bugs are shallow".

Both assertions fail because even without access to source code, it is possible to probe running software for vulnerabilities, and on the other hand the shallowness of bugs depends critically on the eyes looking being attached to people with sufficient competence in the field.

The public reaction to a couple of security incidents during recent years that generated a flurry of largely uninformed punditry are worth revisiting for the lessons that can be learned.

The Solarwinds supply chain incident aka SUNBURST (2020) - One of the most widely publicized yet mostly quite poorly understood security incidents in recent years emerged when it was revealed that adversaries unknown had been able to compromise the build computers where the binaries for their widely used network management software was built for distribution.

The SANS institute has produced a fairly thorough writeup of the incident, which breaks down as follows: The first stage of a multi-stage compromise kit was included in binary distribution packages, complete with authentic signatures from the build system, that were largely put directly into production environments by network admins everywhere. The malware then went on to explore the networks they landed in, and through a process that made heavy use of crafted DNS queries and other non-obvious techniques, the miscreants were able to compromise several high security government and enterprise networks.

Several open source component supply chain incidents (2020 onwards) - Soon after the SUNBURST incident several incidents occured where popular open source components that other systems pulled in as dependencies started malfunctioning or were suddenly unavailable, causing complete malfunctions or loss of functionality such as a web service suddenly refusing to interact with specific networks.

The sudden breakage in open source components caused quite a bit of uproar, and predictably the chattering subset of the consulting class set about churning out dire warnings about the risk of using open source of any kind.

Watching from the sidelines it struck many open source oriented professionals, myself included, that the combination of these incidents carry an important lesson. It is obvious in a modern environment we suck in upgrades automatically and frequently, and that no untested code should ever be deployed directly to production.

Blind trust versus the right to read (and educate yourself) and the right to repair - In the case of proprietary, binary-only software, you have no choice but to trust your supplier and that they will address any defects in a timely manner. The upshot is that with proprietary, binary-only you do not have access to two important features of open source software: The right to read and study the code, and the right to repair any defects you find, potentially saving yourself potential service shutdowns or workarounds while the secret parts of your system get fixed elsewhere.

The lesson to be learned is that you need to run quality assurance on your supply chain. You may choose to trust, but you still need to verify. That goes for open source and proprietary software both.

This Norwegian felt slightly elated when reading that the Norwegian National Security Authority (NSM) provides essentially the same assessments in their published recommendations.

Contributing - Cooperating on Maintenance

As with any product it is entirely possible to be a relatively passive consumer, just install and use, and build whatever you need on top, interacting with the community only via downloading as needed from the mirror sites. Communicating via online forums, mailing lists or other channels is entirely optional.

If you are a developer or integrator with an ambition to make one or more opern source products central to your business either by using and contributing to an existing project or starting a new one, several approaches are possible.

Let's take a look at the strategies some big names adopted on open source in their products:

Grab and fork, sell hardware: The Netscaler load balancer and application delivery products were based on a fork of FreeBSD.

They appear to have rewritten large parts of the network stack and devised a multifunctional network product on top, which among other things features a slick web GUI for most if not all admin tasks.

If you look closely, Netscaler (since acquired and rebranded by Citrix) appear to cultivate a menagerie of open source projects to interface with their products.

However they appear not to have in particularly close contact with their main upstream. (It is worth noting that the BSD license does not require publishing changes to the code base.) When dropping to a shell on a Netscaler unit, last time I looked the output of uname -a seemed to indicate that their kernel was still based on FreeBSD 8.4, which the FreeBSD web site lists as End of Life by August 1, 2015.

Grab and fork, sell hardware, keep sync with your upstream: Starting with the initial release of macOS, Apple have maintained the software that drives their various devices, from phones to desktop computers and related services with generous helpings of open source code, along with what appears to be a general willingness to publish code and interact with upstream projects such as the FreeBSD project. Apple maintains the Open Source at Apple site for easy access to the open source components of their offerings.

This mode of open source interaction seems to be rather common, especially among network oriented suppliers of various specialty gear.

Open source everyting, sell support: Despite early scepticism from business circles, several companies have built successful companies on the model of participating or even driving the development of open sources systems or components, making support contracts (which may include early or privileged access to updates) as well as consulting services the main or sole source of company revenue.

Decide what code is both good enough to publish and useful elsewhere: Finally, for those of us in the services or consulting business who will occasionally write code that is not necessarily business specfic, the reasonable middle ground is just that. Identify code that meets the following criteria:

  1. Was developed by yourself and cleared by your organization and other stakeholders such as your customer as such
  2. Is high enough quality that you dare show it to others
  3. Does not reveal core aspects of your clients' business
  4. Is likely to be useful elsewhere too
  5. Would be nice to have exposed to other sets of eyes in order do identify bugs and fix them

If you have code under your care in your organization that meets those criteria, you should in my opinion be seriously considering making that code open source.

Your next adventure will then be to pick an appropriate license.

Now for Policies and Processes - Do You Have Them?

If you have followed on this far, you probably caught on to the notion that it is wise to set up clear policies and procedures for handling code, open source or otherwise.

Keep in mind that

A license is an assertion of authority. A license is a creator's message to the world that states the conditions others must abide by when using, or if they allow it, change and further develop the code.

Without a license the default regime is that only the person or persons who originated the code have the right to make changes or for that matter make further copies for redistribution.

For that reason it is important to ensure that every element of your project has a known copyright and license.

There have been quite a few instances of free software project rewriting functionally equivalent, or hopefully better, versions of whole subsystems because of unacceptable or unclear licenses (see the OpenBSD articles in the Resources section for some examples).

Procedures and policies, you need them. A self employed developer working on their own project is usually free to choose whatever license they please. In a corporate environment, any code developed is likely tied to a contract of some sort, which may or may not set the parameters of who holds the copyright or what licenses my be acceptable. The exact parameters of what can be decided by contract and what follows from copyright law my vary according to what jurisdiction you are in. When considering whether to publish your own code under an open source license, make sure all stakeholders (and certainly any parties to any relevant contract) agree on the policies and procedures.

Keep it simple, for your own sake. There are supposedly several hundred licenses in existence that the Open Source Initiative considers to be open source. In the interest of making life easier for anyone who would be interested in working on your code, please consider adopting one of those well-known licenses.

They range from the simplest, BSD or MIT style ones that run a handful of sentences and can be condensed to you can do whatever you like with this material except to claim that you made it all yourself to elaborate documents (the GNU GPL v3 comes to mind) which set out detailed terms and conditions, may require republication of any changes under the same terms, and could set up a specific regime with respect to patent disputes.

It is also important to consider that components you use in your project may have specific license requirements and that different licenses may contain terms that make the licenses incompatible in practice.

My general advice here is, make it as simple as possible, but no simpler.

Or to rephrase slightly: The general advice for dealing with licenses echoes that of dealing with crypto code: Do not set out writing your own unless you know exactly what you are doing. Avoid that path if at all possible.

When in need, call in Legal (but make sure they understand the issues). Lawyers endure a lengthy education in order to pass the bar and turn to practicing law, but there is no guarantee that a person well versed in other business legalese has any competence at all when it comes to matters of copyright law. When you do turn to Legal for help, be very exacting and stern in insisting that they demonstrate a command of copyright basics and if at all possible have a reasonable real world understanding of how software is built.

As in, you really do not want to spend an entire afternoon or more explaning the difference between static and dynamic linking and why this matters in the face of a certain license, or that specific terms of different licenses deemed open source by the Open Source Initiative may in fact be incompatible in practice.

It is important to keep in mind that doing open source is about making our lives more productive and enjoyable by exchanging ideas between quality professionals, perhaps sharing the load of maintenance and leaving us all more resources to develop our competence and products further.

The Way Forward - The Work Goes On

So this is where we are today. Modern software development and indeed a goodly chunk of business and society in general depends critically on open source software.

If you enjoyed this piece (or became annoyed by any part of it) I would like to hear from you. I especially welcome comments from colleagues who have experience with open source use and/or development in enterprise settings. Of course if you are just curious about open source software in these settings, you are welcome to drop me a line too. I am most easily reachable via email nix at nxdomain dot no.


I want to extend thanks to Malin Bruland and Knut Yrvin for excellent comments and proofreading.

Resources

All things open source (including an almost encyclopedic collection of licenses) at The Open Source Initiative

Wikipedia: Berkeley Software Distribution about where the Internet came from

The GNU Operating System, supported by The Free Software Foundation

The FreeBSD operating system project

Open Source at Apple

Peter Hansteen: What every IT person needs to know about OpenBSD Part 1: How it all started,
What every IT person needs to know about OpenBSD Part 2: Why use OpenBSD?,
What every IT person needs to know about OpenBSD Part 3: That packet filter
(or the whole shebang in the raw at bsdly.blogspot.com)


Bradford Morgan White: The Berkeley Software Distribution

Nasjonal Sikkerhetsmyndighet (NSM): Åpen kildekode i den digitale leverandørkjeden (Norwegian only)

Business of Apps: Android Statistics (2023)

Bank My Cell: How Many Android Users Are There? Global and US Statistics (2023) (Source: https://www.bankmycell.com/blog/how-many-android-users-are-there)

Statista: Market share held by Apple iOS operating system of smartphone shipments from 1st quarter 2011 to 4th quarter 2022

Appendix: License Complexity Measured by Word Count

While presenting on free and open source software in enterprise environments, the topic of license complexity and how to handle licensing matters usually generates questions of the type,

"Does doing open source mean we need to staff an Open Source Program Office?

Does this not add a considerable measure of complexity to the development organization?

Do the open source licenses mean we have to hire even more lawyers?"

So I set out to do a little research. I figured that the number of words in a text is a useful, if not perfect indicator of complexity, so we could use that measure as a useful and easy to obtain proxy for measuring how complex the licenses we are likely to encounter are in practice.

I headed over to the Open Source Initiative website and their excellent collection of open source licenses. I then picked out the more common open source licenses, and for each license I pasted the text into the word counter at wordcounter.net, which in addition to the word count provides an indication of likely target audience "reading level" and estimated reading time as well as a few other measures of the text characteristics.

The results are in the following table:


License complexity by wordcount
Word count Reading
Level
Reading
time
1-clause BSD License 160 College Graduate 35s
2-clause BSD License 191 College Graduate 42s
3-clause BSD License 220 College Graduate 48s
GNU GPL v2.0 2964 College Graduate 10m47s
GNU GPL v3.0 5608 College Graduate 20m30s
Apache License v2.0 1677 College Graduate 5m44s
Microsoft 365 Developer program license 4803 College Graduate 17m28s
Microsoft Windows 11 OS license terms 5766 College Graduate 20m58
Oracle End User License Agreement 2554 College Graduate 9m17s
Adobe End-User License Agreement 450 College Graduate 1m38s
Apple Licensed Application End User License Agreement 1524 College Graduate 5m32s

Once again, strict word count is not a perfect indicator of complexity — other measures such as sentence length and logical structure and interdependencies are likely to matter in real life scenarios.

No comments:

Post a Comment

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.