So I'm a naive user of technology. (No. Really I am. Ask anyone that's worked with me.) I am definitely not an expert in modern XML document standards. (I have actually hacked troff escapes in a document production chain to insert commands in the PostScript output stream that would be recognized by the PDF generator to produce a hyper-linked document, so I know a little bit about the concepts involved, but that was also 10 years ago now.)
I am a [marvelously happy] Mac user for the past two years. That means I already have iWork 2008 loaded with the new improved Pages '08 (the Apple word processor). On the Apple web site, if I search for "office open xml" then I end up on this page (31 Aug, 2007), which tells me all about Pages '08:
Pages ‘08 supports industry-standard formats, so you can easily open documents created in other word processing applications and share documents with others. Whether they’re using a Mac or a PC.
Open for business.
Import your Microsoft Word documents into Pages ’08 with ease. Whether they’re Microsoft Office 2007 (Office Open XML) or earlier Word files, Pages will open them. Pages imports not only the text, but also the styles, tables, inline and floating objects, charts, footnotes, endnotes, bookmarks, hyperlinks, lists, sections, change tracking, and other elements of your original Word document.
COOL! I'm in! This is awesome. I want to see how well I can read interesting docx files. As it happens, ECMA International makes the Office Open XML standard available as both PDF and as docx files. Clever — it's a document format standard see, and so they've provided it in its own format. Perfect.
So I download the .docx version of ECMA-376. All 5 parts of it. And I open "Part 1 - Fundamentals" and immediately get told some warnings occurred:
I choose to review and get:
The file mostly looks good, but not quite as clean as the PDF image with the other font (Consolas?). And clicking on the first warning (about the unsupported field) gives me NO additional information to understand what/where the error might be. Now this is what we in the standards industry call "a quality of implementation issue". Clearly Apple has not done a good job. Get used to hearing this phrase a lot in the press — I'm predicting Microsoft will be forced to apply it liberally to their partners that helped them win votes and helped with the marketing message.
Then I notice the paging problem. I have no idea why, but there seems to be page drift between the PDF and .docx versions. [More on THAT little problem in a minute.] The paging problem does NOT mean there's necessarily a problem with the standard itself but rather the document production machine ECMA was using — we don't know what the definitive source and tool chain was that produced the PDF. (Serious document production is the same as serious software production, something most word processor users fortunately don't get to experience.) Oh, and there are line numbers in the PDF that don't appear in the .docx as opened by Pages '08.
Ignoring the document version skew problem, I decide to see what happens when I throw an even bigger docx file at Pages '08. So I open "Part 3 - Primer" and ...
A few more "warnings" to deal with here. More missing font problems. Things were "removed". No helpful information as to what or how.
I asked a friend with Office 2007 to download and open the two .docx files. You guessed it — no warnings. So we're now on the slippery slope. Apparently I can create files in Office 2007 that Microsoft marketing claims are "standard" Office Open XML that may (or may not) use proprietary extensions. Or maybe Apple did a really bad job. How would a government customer interested in preserving documents know? But it gets worse. The Office 2007 pagination perfectly matched the PDF version. And there are line numbers in the Office 2007 version just like the PDF version.
I'm betting the average business or government office person saving a file won't think twice about it. You see Office 2007 gives you no way to save something as "strict" Office Open XML. Not even not by default, but not at all. Microsoft's definition of "Office Open XML" appears to be .docx itself.
Indeed, even Apple's Pages '08 will only EXPORT to old Microsoft Office format (.doc) and not the standard Office Open XML (.docx) format. So I appear to have no way to generate a OOXML file from Pages '08. [Yes, yes, yes — Microsoft will again point out it's a quality of implementation problem. Or they'll point out that Pages '08 is a "consumer" of OOXML only, which is allowed by the standard. I get it. It's not Microsoft's fault. I'm beginning, however, to wonder at the quality of implementation on the Novell platform. There's a business partnership under duress.]
So as an adjudicated monopoly of desktop operating systems, supplying an office productivity suite with 95+% market share, they will be able to claim instant victory for the adoption of their international standard because .docx files equal Office Open XML standard files. Oh, wait — that's what was essentially done in the IDC study published this week that was "sponsored by Microsoft".
[Now we're about to get a wee bit tedious and exact as standards wonks are prone to be. I'm going to try to explain the conformance game. It can be subtle. Apologies in advance for perhaps getting too ... well boring. If you're not interested in standards mechanics, you can safely stop reading.]
So OOXML defines a couple of types of conformance. There is Document Conformance, and Application Conformance. And conforming applications can be producers (i.e. OOXML document writers) or consumers (i.e. OOXML document readers) or both. Here's the text from the standard [Part 1, PDF edition, p. 3, lines 8-30]:
2.3 What this Standard Specifies
To address the issues listed above, this Standard constrains both syntax and semantics, but it is not intended to predefine application behavior. Therefore, it includes, among others, the following three types of information:
- Schemas and an associated validation procedure for validating document syntax against those schemas. (The validation procedure includes un-zipping, locating files, processing the extensibility elements and attributes, and XML Schema validation.)
- Additional syntax constraints in written form, wherever these constraints cannot feasibly be expressed in the schema language.
- Descriptions of element semantics. The semantics of an element refers to its intended interpretation by a human being.
2.4 Document Conformance
Document conformance is purely syntactic; it involves only Items 1 and 2 in §2.3 above.
- A conforming document shall conform to the schema (Item 1) and any additional syntax constraints (Item 2).
- The document character set shall conform to the Unicode Standard and ISO/IEC 10646-1, with either the UTF-8 or UTF-16 encoding form, as required by the XML 1.0 standard.
- Any XML element or attribute not explicitly included in this Standard shall use the extensibility mechanisms described by Parts 4 and 5 of this Standard.
2.5 Application Conformance
Application conformance is purely syntactic; it also involves only Items 1 and 2 in §2.3 above.
- A conforming consumer shall not reject any conforming documents of the document type (§4) expected by that application.
- A conforming producer shall be able to produce conforming documents.
This is the traditional way things are done with programming languages standards as well. The concept of a strictly conforming C-language program is defined in the ISO/ANSI C standard so as then to define conformance of an actual implementation (i.e. C-language compilers). In the OOXML standard, document conformance exists to be able to talk about implementation conformance, i.e. what readers/writers need to produce or accept if they conform to the standard.
For completeness sake, the "document type" reference in 2.5 above is described in section §4 as [Part 1, PDF Edition, p. 6, lines 16-26]:
document type — One of the three types of Office Open XML documents: Wordprocessing, Spreadsheet, and Presentation, defined as follows:
- A document whose package-relationship item contains a relationship to a Main Document part (§11.3.10) is a document of type Wordprocessing.
- A document whose package-relationship item contains a relationship to a Workbook part (§12.3.23) is a document of type Spreadsheet.
- A document whose package-relationship item contains a relationship to a Presentation part (§13.3.6) is a document of type Presentation.
An Office Open XML document can contain one or more embedded Office Open XML packages (§15.2.10) with each embedded package having any of the three document types. However, the presence of these embedded packages does not change the type of the document.
Now there is no statement of conformance to Office Open XML on the Apple web site beyond the above statement of "support". A search in Pages '08 Help for "office open xml" finds no reference at all. So Apple appears not to actually claim conformance to the OOXML standard anywhere. They simply "support" it. So they're not really guilty of not reading a conforming OOXML document.
But the Microsoft standards and marketing machines are claiming "support" for their standard with the assured tones that "support" = "conformance". Aside from the successful "adoption" claims in the aforementioned IDC report (where Office 2007 market share apparently equates to Office Open XML adoption) we have Tom Robertson (Microsoft General Manager of Standards and Interoperability) "citing support in products from Novell, Corel, Apple and others." Disingenuous at best.
Jason Matusow points out on his blog:
A real litmus test for the viability of the ISO/IEC DIS (draft international standard) 29500 (Open XML) is whether or not there are independent implementations. The answer to this question for Open XML is an unequivocal yes. There are independent Open XML implementations based on the existing specification in applications that run on Linux, Mac, Palm OS, iPhone, and Windows.
Again note the complete lack of reference to actual conformance per the definitions in the standard they have driven through the process. These are the people that are responsible for standards management and messaging at Microsoft. They are by definition the folks that should be defending the strict conformance of the standards in which they participate, and not merely suggesting that partial implementations are a "great start".
So where does this leave the government customer that thought they were buying an open document format for document exchange and interop? It is indeed finally time to roll out the certification machine — for everybody. Let the games continue.