Sunday, April 13, 2008

Does anybody here remember Artie Eff?

That's what you thought you heard, right? Actually, this is not Uncle BSDly's nostalgia for a backstreet hoodlum he might have known way back when. It's Uncle BSDly trying to point out what happened the last time a standard was entrusted to that little Seattle company called Microsoft. R-T-F. The Rich Text format.

So the question is,

Does anybody here remember RTF?

It's been a long, long time, so you probably do not remember. But take my word for it, way back when OS/2 was to be the Next Operating System, the Rich Text Format was generally hailed as a Very Good Thing. It was a document and data format which came with a published specification, and the plan was that this was to be the vendor neutral data exchange format for all applications which would talk RTF compatibly and henceforth data destruction due to inane incompatibilities and fuzzy interpretations would be a thing of the past.

The original RTF 1.0 specification has been preserved at sourceforge as http://latex2rtf.sourceforge.net/RTF-Spec-1.0.txt, and our friends at WikiPedia have a short but nice article at http://en.wikipedia.org/wiki/Rich_Text_Format.

I originally started writing that piece at the end of July 2007, before the ISO voting process over Microsofts "Office Open XML" specification had started. I was working on a book at the time, and never got very far with my planned overview with a very personal slant on that file format. The RTF format was actually useful in its time, and how it is constructed and how it was maintained can show us a lot of useful things. Now that it's clear that Microsoft succeeded in getting ISO approval for its specification, it's time to return to that subject.

What a process it has been. Microsoft has succeeded in buying itself an ISO standard. Piece by piece, in a process that included ballot-stuffing, bribes, and the administrative overruling of the relevant technical committee's decision by a national standards body (right here in Norway), Microsoft bought itself or is very close to getting its wholly-owned ISO standard. Nevermind that there is at least one formal investigation by the EU into the process and that the chairman of one national body's technical committee (once again, in Norway) has lodged a formal protest over procedure with ISO.

Microsoft won, and we will all have to learn to live with the consequences.

One of the very effective tactics was to have Microsoft partners sign up as voting members of national standards bodies, with clear instructions on how to vote. This lead to an influx of new voting members in various parts of the standards organization so large that other standardization efforts where Microsoft has no interest at the moment have reportedly ground to a halt. The reason is that what remains of formal procedure in ISO dictates that if enough members of a national body fail to vote on a proposed standard, the standard is not approved and fails by default. So the first victim of Microsoft's ISO takeover is no new standards. Who would want more ISO standards anyway?

And oh yes, Microsoft's main OOXML propagandist in Norway has offered up this document as proof that OOXML has been implemented and is in fact useable in OpenOffice too. It did not open at all in my OpenOffice 2.3 running on OpenBSD 4.3-current, (see here and here for the two stages of the results), and informed sources tell me that the much-ballyhooed formula is not editable when the document is imported into other applications such as Microsoft's own Office 2003 plus the compatibility pack.

And as if to emphasize the fact that all you can guarantee when you call something "XML" is that the data will start with a < and end with a >, this XML document in fact handles the equation part as an embedded binary object, intimately tied to Microsoft's equation editor. I suppose this means that even at 9,000 pages and counting, the spanking new ISO standard does not cover handling equations in a satisfactory manner. All in all, just how much more weirdness this will lead to is hard to predict.

But I digress. As I said earlier, RTF was thought to be a very good thing. The specification was a lot less ambitious and a lot slimmer than the XML based one, and since RTF was the source format for the Microsoft Windows Help file format, recommended for all apps to run on Windows, there were even usable examples documented in one of the several manuals you would get with the Microsoft programming tools.

As you would expect, word processors and other applications were rewritten to be able to read and write RTF. I worked in documentation and localization at the time, and over the years I had a fair amount of exposure to the format and the tools. Basically, any Windows application worth documenting would require an online help system and the source format would be RTF.

Just like an XML file has to start with a < and end with >, RTF files are made up of elements that are delimited by matching { and } curly braces. In principle, you could write valid RTF using any text editor, and the Microsoft documentation would show you how. That fact came in handy a few times when Word managed to mangle some other document with its 'fast save' feature or as punishment for sending the document to be edited at one machine too many during a technical review. The solution to essentially any problem, as long as the .DOC file was possible to open at all, would be to save the document as .RTF, count the number of {s and }s and see if the numbers were equal. If they were not, you would add one of the missing kind at either the top or the bottom, using any straight text editor. The magic would work, or at least put you in a position where you could extract useful data.

Over time you would upgrade to newer tools, and Microsoft's development discipline (amply documentend in the OOXML tomes) would show up at every Word upgrade if you had not been paying attention.

It would go like this: You get the fresh Word version with all the new bells and whistles, and do another revision of the help system for an app. The help compile would break horribly, and after some hair-pulling and tabletop-thumping you would eventually remember that this had in fact happened before.

The definition of RTF had changed.

After a while you would discover that hidden somewhere deep in the new Word's online help or README you would find Microsoft noting that application developers would need to install a newer version of the help compiler to get their RTF to compile. Or to put it slightly differently, the operative definition of what "RTF" is would always be "whatever Word produces when you save as RTF". To their credit, the RTF people at Microsoft would usually publish a new version of the specification some time after a new Word release.

So, we've been there before. "Whatever Microsoft {Word,Excel,PowerPoint} produces" is likely to be the operative definition of what OOXML is, even if the 9,000 pages and counting specification says something subtly different.

With a specification that lets dates be represented in radically different ways depending on context, say if you're saving from a word processor, a spreadsheet or a presentation program, we're in for a lot of entertainment, I'm sure.

The other things I'm sure about are that there will never be an implementation of OOXML as currently written (even if the OpenOffice.org people have made noises that they might try), and that a full implementation of anything vaguely similar will have to come from Microsoft.

Real gluttons for punishment could try the new hobby (or profession, if you can get a sponsor) of tracking the the day to day development in the set of differences between OOXML-as-published and OOXML-as-implemented.

Stuff I'll be getting back to: OpenBSD 4.3 is about to hit a mirror near you and hopefully your mailbox very soon, and more spam follies.


Update 2015-04-04: The demonstration document appears to load correctly in LibreOffice 4.3 -- I just tried it, and a screenshot is available here. I have no useful data on when the document became readable for anything not produced by Microsoft. In other news, Microsoft has now announced that an upcoming release of their Office product will be able to read and save ODF files.