Sunday, February 27, 2011

The Problem Isn't Email, It's Microsoft Exchange

The takeaway: don't pretend your appointment book can handle your email. And don't blame the Internet for all the compatibility issues. The main problem is Microsoft Exchange.

I care about email. In fact, a large part of how I have made a living over the years has depended on a reliable email service. I get a lot of email, and I send my fair share of it too - some of it is correspondence directly related to whatever I'm working on at the moment, some of it is personal, quite a bit comes from topic-oriented mailing lists such as openbsd-misc, and a large chunk of my mail archive consists of automatically generated mail sent by systems in my care. I've also been known to treat email much the same as other correspondence, rarely if ever deleting messages. When the mailboxes became too unwieldy I would transfer some of the contents to archive storage.

I've become convinced that a large part of the reason I don't mind dealing with large volumes of email is that I started doing it before Microsoft became an actor in the Internet email market. Way back in the late eighties and early nineties, email of the Internet, TCP/IP, kind would be handled by some sort of Unix box (a BSD or, by the mid-nineties, a Linux variant, perhaps) that would frequently offer shell command line access, but more likely than not also email reading via POP or IMAP interfaces.

And it worked. Users who insisted on (or needed to be on) a Microsoft desktop could be persuaded to install a useful email client such as Eudora (now defunct but fortunately Qualcomm donated the code base to Mozilla for integration in Thunderbird), and for mailboxes that became too unwieldy, the advice would be to just move content to mailboxes that Eudora wouldn't load into memory by default, such as the ubiquitous Inbox. Over the years the volumes of and the nature of email changed gradually, so along the way we learned to deal with spam and mail-borne Microsoft worms by installing content filtering and setting up other tools. Still, everywhere I worked, apart from the unavoidable but infrequent freak incidents SMTP email was considered reliable, and your email archive was just that.

From other parts of the world we would hear every now and then stories about the death of email, and recently even a largish IT company announced that they were planning to get rid of all email in the near future. Email, the story goes, is just too time consuming and disruptive. I never quite understood what they were on about.

Then not too long ago I started working regularly in an environment where email is done the Microsoft way, via Exchange and Outlook. And it has struck me that they're right: If your email experience is via Exchange and Outlook, the net effect is both time consuming and disruptive.

Forced to work with an all-Microsoft desktop for the first time in years (where my most frequently used application by far is putty.exe, but that's beside the point here), I found Outlook's user interface clunky and with frankly insane default settings ("rich text" by default, newest messages on top and positively deranged quoting setups, more about that later) that were for the most part fortunately changeable, at least on a per mailbox basis.

The first revelation came when I heard a co-worker praise newer Microsoft Office releases "because 2007 and newer has discussions". I was forced to imagine how life must have been like without threading as we've tended to call it on the USENET and mailing lists since, well, the late 1980s. Outlook's predecessor Microsoft Mail of course did not support threading, but I suppose any plans to support threading via References: headers and suchlike received a major blow when the translators of MSMail decided not to leave the RFC-dictated "Re:" prefix alone, but rather translate it for local language versions and lead the way to the "Re: SV: Antw: VS:" and so on cascades we see in the Subject: fields for correspondence between users of Microsoft mail clients and others.

No big surprise then, that when Microsoft decided to "invent" threading for their messaging products, they again ignored the RFC-compliant References: header and chose to implement their very own version based on a set of X-something headers that appears to make the threading a local-to-this-Exchange-server (and Outlook clients only) thing. Messages that do not retain the X-something headers regularly show up as separate "discussions". All this is to a Unix-head much like the "Recall" functionality that always draws smiles on mailing lists.

Being robbed of any easy way to track the relationships between messages in your mailboxes is bad enough, but there's more. Even with a limited sort of threading in place (even one that would break at the slightest interference from outside software), the damage had already been done by software that introduced counterproductive, confusing and time consuming response practices.

For reasons that have never become entirely clear to me, the developers of Microsoft email client software decided that direct and limited quoting of text from previous messages was not a priority. So rather than build on earlier work where we would have exchanges like

From: First Correspondent <first.correspondent@onecompany.nx>
To: Second Emailer <second.emailer@otherplace.nx>
Subject: A most enlightening message

Dear Second,

Here I offer an important insight that I would like to share.
Followed with random commentary that may or may not be important.

I hope you agree this was worth sharing.

Yours,
First

where a typical response from Second would typically be something like this,

From: Second Emailer <second.emailer@otherplace.nx>
To: First Correspondent <first.correspondent@onecompany.nx>
Subject: Re: A most enlightening message

First Correspondent <first.correspondent@onecompany.nx> writes:

> Here I offer an important insight that I would like to share.

Thanks for sharing that! The next bit was really about something else
entirely, but is probably worth discussing over refreshments at an
appropriate time.

> I hope you agree this was worth sharing.

Oh, definitely! We'll get plenty of good out of this as time goes by.

Be seein' ya,
Second, jr

they chose a different approach entirely.

Keep in mind that other parts of the world that were already used to email and related forms of communication such as Usenet news, where exchanges like these were commonplace and gave a reasonable certainty as to who said what, when.

What Microsoft did instead was to introduce a wholly new convention for email responses. The details vary over the various versions, but the main parts were to wrap any text information in pseudo-html formatting and place the entire previous message after the present correspondent's signature, with the cursor for the user to input text at the top.

Inline quoting like in the exchange I quoted earlier was tricky bordering on impossible, and adventurous users would resort to tricks like "my parts are the ones in magenta", only to discover that the carefully hand-painted text would fail to render correctly on any other software than their own version, down to the minutest patch level.

Thus was born the age of all-inclusive top-posting, where deciphering the true meaning of any of the paragraphs on top of the message could take more moments than you really have at your disposal, the time needed to decipher the cascade of earlier messages included. Not only would the ever-expanding, all-inclusive (but actually rather unreliable and far from tamper-proof) discussion-in-a-message convention confuse all readers involved, it also meant that the text and any file attachments would be stored multiple times, many times over for long discussions. It would take only a minimally uncharitable view of the average C?O's intellectual capacity to suggest that this was a prime mover behind the intense rush to "data deduplication" in storage marketing literature a few years ago.

Which takes us to the next item: Storage. Taken in semi-random order, the next hurdle for a Microsoft email user to overcome is storage. Outlook by default uses its own binary format for local message storage, know as PST files or Personal Storage Table files as the informative Wikipedia entry explains. In some configurations all mail is stored in a database of sorts on the Exchange server, and the user may or may not have the option to save messages to local PST files to work around space limitations on the server.

It is not uncommon for Exchange admins to turn off users' ability to save messages to PST files. One major reason is that more likely than not any saved PST file will end up on the end user's computer, with the consequence that potentially important messages may end up being backed up infrequently, if at all. Other reasons to avoid PSTs are size limits (originally 2GB but larger in newer releases), but the thing that tends to scare people the most are horror stories of data corruption to the point of absolute unrecoverability. As in gigabytes of your business or personal life gone, due to a scrambled PST file. There is anecdotal evidence that missing or scrambled PST files are a big headache for those who for various reasons want to look into the inner life of the Bush 43 administration.

So for records keeping involving your email, you're in a bind: Your mailbox size is likely to be limited -- every Exchange admin knows that large mailboxes will hurt performance, impacting all users of that server -- and the only way to save messages offline is a known-unsafe method. As far as I have been able to find out, there is no easy way (other than extracting messages to a separate system, say via IMAP) to export mail from the Microsoft product combo to any text or non-microsoft mailbox format.

Now weigh those practical considerations against legislation that dictates all business related correspondence be kept on file for a matter of several years. The exact number of years varies by location, but unless you've purchased one of the add-on solutions for archiving, you will be struggling to keep in line with requirements.

It all comes down to the shortsightedness or intellectual shallowness of Microsoft Exchange's designers, way back then. It does make sense that your appointment calendar application should be able to send and receive email, and it kind of makes sense that your appointments are within easy reach from your email client.

Those facts do not, however, dictate that the appointments calendar and your email archive should share a common storage backend. In fact, it's likely that the decision to merge the email storage and appointments storage into one is the direct cause of many of the inefficiencies of Microsoft Exchange.

In one recent incident involving a user mailbox of perhaps a couple of gigabytes, where the bulk of the data was made up of an estimated (since Outlook never managed to display totals before freezing) 1.5 million messages of about one kilobyte each, even deleting the messages using an Outlook filtering rule (the content was not of a nature that required long term storage) literally took weeks, typically proceeding at a rate of one message per second early in the process, speeding up to somewhere in the five to ten messages per second rate near the end. Fortunately the user in question was able to access email functionality via the Outlook web access interface while deletion proceeded, but anecdotal evidence suggests that the workload had measurable performance impact on other hosts attached to the same SAN.

Even if you tackle the storage hurdle, you more than likely will be tripped up by other inanities in the software design. There are bound to be other pitfalls, but here is my personal list of things that continue to irritate me (in addition to the default "rich text" formatting), coming as I do from the outside world:

Using Outlook it appears to be impossible to see what your From: address will be before you send the message. The effect is sometimes quite bizarre, in my case since the site has several domains, I of course ended up signing up to several mailing lists with a wrong address, banishing my posts there to moderator queues until I was able to study the real mail headers on a non-Microsoft system.

Also, Outlook is overly helpful in filling in adress fields such as To: and Cc: from common address books and Active Directory, leading in at least one case I know of to a supposed-to-be-private message to be sent to every mailbox in a largish corporation. That's when you learn that after the first reply, retracting the message won't actually work.

And no rant about Exchange would be complete without mention of the largely information-free bounce messages the system generates for non-delivery. A significant portion of the spamtrap addresses I use have been fished out of bounce messages, and the Exchange ones stand out as the ones practically guaranteed to exclude any information about where the triggering message came from, or when.

Summing up, if you're an executive who feels that your organization is saddled with inefficient email processing and dubious archiving, the likely culprit is not email as such, but rather the poorly constructed application some unscrupulous sales person inserted in your network for you.

Changing to a standards compliant, preferably open source, alternative is likely to save your organization costs at all levels, including hardware and software acquisition and maintenance costs as well as significant personell time. At the same time a move to a standards compliant, open source solution will likely leave you in a better position with respect to security, information consistency and verification. A full treatment of email as a business tool would have had at least one column of similar length as this one on each of these topics, and I may return to those in future columns. In the meantime, if inefficient emailing bothers you, you may need to realize that a large part of your problem is Microsoft Exchange.



St. Patrick's Day PF tutorial in Tokyo: Returning readers may already be aware that I will be giving a PF tutorial at AsiaBSDCon 2011. My session will be on March 17th, known in some parts of the world as St Patrick's Day. You can register for my session and others here, hope to see you there!