The ISO "fast track" vote on approval of Microsoft's OOXML document specification happens next Monday (2 Sep.), and news is breaking fast and furious as various countries report out early. An interesting bit of technical experimentation was published in the past week in the shadow of the vote. It shows something more pragmatically damning than all of Rob Weir's hard work digging through faults in the OOXML specification.
The work is published as a set of experiments. The first experiment takes a trivial spread sheet created with Microsoft Excel 2007. The next step is to unpack the Excel generated "standard OOXML" and make a trivial edit to it and repack it. Excel then complains violently about the result (i.e. to the point of not reading the document).
This is a catastrophic failure on two fronts for Microsoft's "standard":
- It means Microsoft Office DOESN'T ACCEPT WELL FORMED OOXML documents not produced by Microsoft Office. This would be very bad for all those customers that believe the marketing and think they're buying a product implementing a "standard" that will [someday maybe] be supported by multiple implementations.
- It means the Microsoft OOXML specification has sufficient problems in what it isn't saying in that implementors can't implement it. (Changing a one character field in human readable XML with an editor is about as basic as it gets for implementation.)
Game over. A standard that can't be implemented is WORSE THAN USELESS. It really demonstrates that this standard they rammed through ECMA is nothing more than a vendor's product specification. The rest of the experiments are equally telling in terms of the apparent use of product features outside the specification, etc.
This is not to say that Microsoft won't keep the steam roller crashing right along. From Mary Jo Foley this week we see that Microsoft is still "buying research" from IDC in a study declaring great uptake for OOXML. [I'm not linking to it because frankly it deserves to be buried. I won't even give it the small link credit this blog engenders.] While the IDC folks try to avoid complicity in the biased survey by openly declaring under the title that it was "Sponsored by Microsoft", we know the Microsoft marketing machine will happily be using the numbers and graphs in slides, attributing it all to an IDC study and failing to mention they paid for the "opinions". Too bad for the credibility of the IDC researchers when this one comes home to roost.
The incredible adoption for "OOXML" is likely a measure of market share on Office 2007 since it would be the only product claiming to implement the standard. The way the "study" keeps conflating PDF with ODF/OOXML is broken as well. So WHEN customers discover the truth of the implementation difficulties and continued lock-in of their document world, they'll respond the way customers always do when a vendor pulls a fast one on them.
Perhaps the most disheartening failure however is the way Microsoft has abused the machine that is the world's standards development organizations. Andy Updegrove has a brilliant if painful read on the damage that Microsoft has caused in its "race to win." Standards are needed and important. In the days of film cameras, you wouldn't have been able to grab a roll of 35mm ASA 400 film from any store anywhere on the planet (produced by multiple different companies with their own value added services and technologies) and have it just work in your camera without such standards from such organizations.
Microsoft is pushing organizational rules to the breaking point, ballot stuffing and buying its way to a win at ISO. They have complained from the beginning of the process that they're doing nothing different than the likes of their competitors who are "aligned against them." But this shows as much ignorance and naivete as the quality and implementability of their specification. Companies like IBM and Sun have invested deep and long and globally in their standards participation activities. They don't need to stuff the ballot box in a sudden end game -- they're invested in the long term infrastructure itself. They absolutely play to win and sometimes surprise one in their creative use of the rules and organizational structures. But they win because of a long term commitment to the machine itself, not to winning a particular battle at all costs.
Standards at Microsoft has been run by lawyers instead of technology standards practitioners for too long. It's been demonstrated time and again from the very beginning of the ODF working group formation in a succession of tactical and strategic failures by Microsoft. To those lawyers that have never actually participated in standards working groups, or the standards management process, or implemented to the specification, or designed a certification process to demonstrate conformance, this is just one more battle to win and hang the consequences. (Indeed, if we can "win" we can use it as precedent to "win" again!)
The sad part is that even if the ISO vote actually goes in Microsoft's favour, it still won't matter. It buys them a few years of market ignorance at best. This entire two year event is one for the standards text books on how not to respond to a business threatening standard. In the end, Microsoft will need to implement ODF natively. They don't know it yet, nor do they understand why, but it is just a matter of time.
Do you really think that's a valid example? He deletes a formula from the worksheet, but doesn't delete it from the calc chain, and Excel complains that the document is corrupt because there's a reference to a non-existent formula. That seems pretty forced to me.
As Miguel de Icaza has pointed out, either you think the calc chain matters (in which case you need to maintain it), or you don't (in which case you can just delete it). I met a 19 year-old in Delhi named Akshaya earlier this year, and he had built an Open XML spreadsheet editor on his own, which he gave me a copy of. That program handles this issue just fine.
Btw, since that page mentions me by name in another context, I'll just add that the BIFF12 documentation is available through the same means the BIFF8 documentation is available: a quick no-cost license that anyone can get. In fact, we've had a large number of people worldwide ask for that documentation recently, and they're all getting it.
Some of Stephane's comments are valid; for example, I agree that more could be done to make the spec less Anglo-centric. But the sky isn't falling, and the specific example you've used is a pretty simple case of deliberately corrupting a document. I still wish we handled the corruption better in Excel, and maybe we will eventually, but it's not a problem with the spec or the standard.
Posted by: Doug Mahugh | 30 August 2007 at 17:57
As a sidelight on Doug Mahugh's comment, here is his profile from his blog:
"My name is Doug Mahugh and I'm a technical evangelist at Microsoft specializing in the Office Open XML file formats. I am also the moderator of the OpenXmlDeveloper.org web site, where Open XML developers share tips, techniques and source code for a variety of development platforms."
In his blog Mr Mahugh posts "neutral" comments such as the following:
"I've wondered before whether all these sleazy anti-Open XML tactics are working. Are IBM and their friends succeeding in creating FUD in the marketplace?"
Posted by: Richard Soles | 30 August 2007 at 23:52
Is it the same Doug Mahugh than the guy posting on http://blogs.msdn.com/dmahugh/ who keeps repeating that the ISO vote stuffing is actually just MS versus IBM ?
Scoop : the software community at large is against the proposed ISO standard.
Shall I remind your colleague's answer while discussing in US INCITS V1, questioned about the proprietary crap that is baked into this "open, platform-independent, vendor-neutral" stuff : Rex Shaelke : "we felt it this way".
Given how much out of touch with the reality you are, why should anyone take your words? You are Microsoft Office PR machine my friend. Everything you say is obstrusive and borders panic and schizophrenia.
As for your points above. Have you really read the article? If you did, you would read that in order to play devil's advocate, what I do next is make manual changes to the calculation chain. And it does no better.
If you knew what you are talking about, or worse if you really wrote what you believe in instead of doing what your employer requires you to do, the only genuine answer to this thing worth making is that : "we messed things up. In BIFF, the calculation chain was governed by cells, and now it's the opposite. It has the very unfortunate consequence to require implementers to rebuild the calculation chain or face full spreadsheet recalculations (i.e. second-class citizen), which in and of itself implies to rewrite a portion of Excel. Please accept our apology, we are taking ECMA 376 back to working draft status, will fix this, and resubmit it to ECMA".
Anyway, I'm perplexed about this schizophrenia Doug. Is it the same guy who was posting warm cheers to me just months ago, and has started ignoring my criticism right after I called BS on the ISO route? What about this :
http://blogs.msdn.com/dmahugh/archive/2006/12/01/diffopc-if-you-need-this-you-need-it-bad.aspx
and this :
http://blogs.msdn.com/dmahugh/archive/2006/08/22/712835.aspx
and this :
http://blogs.msdn.com/brian_jones/archive/2006/10/27/friday-thoughts-oct-27-2006.aspx
Anyways.
As for Miguel's pseudo-rebuttal, perhaps it's time to ask yourself two things :
1) Can you rebutt real examples? I think you can rebutt statements like "we are open and transparent", but I don't think you can rebutt real examples.
2) Miguel works for Microsoft (he thinks it's a pride not to be officially on MS payroll, nevermind the bulk of Novell revenues are a direct influx from MS). But can you guess the retaliation if he said anything negative about this stuff? You have to admit it, he's got no freedom in speech in that very area, plus Microsoft is using him as a tool to break the open source community apart.
Posted by: Stephane Rodriguez | 31 August 2007 at 00:42
As for the BIFF12 documentation, there is more than meet the eye, actually. I don't offer enough context in my article, but this will be fixed in the next couple of days. In a nutshell,
1) it's been reported out there that anyone who has contacted Microsoft to get specs of the BIFFx file formats have received no response at all. That those specs include BIFF12 remains to be seen anyway, we have to take the words of a Microsoft evangelist (officially paid to twist the truth to Microsoft sole advantage), which in turn does not say much at all. When requesting a copy of the file format, there is a section in the agreement which says the licensee cannot be used this stuff to create a competing product. Read that again if you did not catch the irony...
2) BIFF12 was, according to Microsoft (which again does not mean much as far as truth is concerned), built only for performance reasons. This can easily be debunked since SpreasheetML itself is reportedly designed in an extremely obstrusive way for performance reasons already (shared strings, shared formulas, indexes all over the place, ...)
The reason why Microsoft built BIFF12, a shocker, is because SpreadsheetML has plenty of passwords and connection strings in plain-text, and that can be removed or altered with just a text editor. Microsoft wanted to provide a fallback for sceptics, and that's what we end up with.
So here again, there is more than meets the eye, that's why BIFF12 is just a flavor of SpreadsheetML and should be part of the ECMA 376 documentation. Of course, it it were part of it, the immediate consequence is that the "XML" claim is violated.
Posted by: Stephane Rodriguez | 31 August 2007 at 00:51
Hi Doug,
Ok, so it is possible to implement a simple Open XML spreadsheet editor after all. Let's all accept OOXML as an international standard. And the specification is available to everyone? Forget about the ISO process. We have a standard.
Stop.
The successful implementation of a small subset of a specification unfortunately does not validate the entire specification as a standard. Specifications are a dime a dozen. I can write one. Their quality varies wildly and quality here refers to the level of precision at which each concept and algorithm is fleshed out in the specification.
A standard is a special case of a specification. A standard is supposed to be a specification of the highest possible quality, so that it enables anyone, without a partnership with MS, to write an implementation that A) is fully conforming and B) can read/write all the documents produced by the reference implementation which in this case is MS Office 2007. So, the spec needs to answer all the questions. All of them. And believe me, programmers are inquisitive. The devil is in the details as every programmer knows.
A 6000 page heap of paper may be a specification and it may be available to anyone, but it's worthiness for standardization depends heavily on these quality issues as well. According to the noises we've been hearing on the Net about numerous unaddressed technical questions about the OOXML spec it really isn't time to talk about standardization just yet.
Fast track for standardization is a contradiction in terms. Particularly so for a specification written by a single vendor in a dominant position.
Posted by: Jani | 31 August 2007 at 00:55
The problem with the spreadsheet example is that Excel itself makes no attempt to repair the spreadsheet. Where in the OOXML spec does it say what should be handled in this case? You are saying it is noncompliant, but is it or merely an ambiguity in the OOXML spec? Remember OOXML is what the proposed 6000 pages say, not what Excel happens to do. And it is quite possible that Excel is a NONCOMPLIANT implementation. But I doubt it will get fixed.
But it also points out the problem. The format wasn't well thought-out. It's still valid XML to have just ...base-64... but I doubt they would submit that as a standard.
Noting someone else made a program that works with Excel's XML says nothing. Programs already work with the proprietary binary formats. Did your indian colleague code just from the OOXML spec, or did he use Excel as a test oracle? Again, what is the point of having an ISO spec if everyone will have to ignore it and have a copy of Office to test how things actually work? "No with comments" should be the comment "You are going to have to buy Office anyway to have even a slim chance of interoperability, so why bother with this kabuki?".
To have a lot of separate XML chunks so interdependent that the result is a mesh that breaks if even one error occurs is atrocious design. Or an implementation that expects everything perfect is a bad implementation. Perhaps that is why the spec is 6k pages and still doesn't document everything properly.
So if you are trying to make the point that really bad designs can be implemented (and having done too much of that in my career, I would agree), fine, but bad designs ought not become any kind of standard. Alternatively, if the standard is proper, Excel is noncompliant so can't be considered where "open standards" need to be used. It needs to be fixed.
A while ago, when Microsoft was helping (snuff) OS/2, they complained about paying by k-loc (1000 lines of code). They were right. Larger code is much faultier than smaller code. ODF does everything necessary in under a kilopage (maybe a bit more adding exhaustive spreadsheet formula details). An annex for Microsoft legacy support wouldn't take more than a few hundred pages.
Instead we have this pile of attempted documentation of the observed behavior of Office, which fails to properly do that, but shows that anything so complex is absurd. It would be bad enough if the 6000 pages had no ambiguities, contradictions, and 100% coverage. Instead it is a mound hiding quite a few unexpected unpleasant surprises.
Oh, and accusing the open source people of "sleazy tactics" only shows that Microsoft is trying to get pornography to be sold as a kid's movie. The problem is not that Debbie does Dallas competes with Bambi, but that the former has, shall we say, flaws that make it inappropriate for kids.
Standards should be clear and concise. Even complex ones. OOXML isn't. I'd welcome them to try again with something reasonable. That would be better than trying their strongarm tactics to get approval for something this faulty.
Posted by: tz | 31 August 2007 at 05:19
You write: "So WHEN customers discover the truth of the implementation difficulties and continued lock-in of their document world, they'll respond the way customers always do when a vendor pulls a fast one on them."
What makes you think that? Remember "Expanded Memory"? It was sold as a way to address more than 640K of RAM. It was lame; a necessary workaround for the Intel 80286 architecture but completely unnecessary on the 80386. The 386 was introduced in 1987 yet Windows for Workgroups, the business class Microsoft Windows product, continued to require expanded memory. It was not until Windows 95 was introduced in 1995, 8 years after expanded memory became obsolete, that Microsoft abandoned expanded memory.
For 8 years programmers had to struggle with memory management, unnecessarily. This is hardly an isolated example. Working around design deficiencies seems to be the essence of working with Microsoft products. And back then Microsoft did not have anything like the monopoly position they have today. What makes you think it will be different this time?
People only miss something that's been taken away from them. Working with Microsoft data formats has always been a struggle, working with OOXML will be entirely unremarkable in that respect.
Posted by: Karl O Pinc | 31 August 2007 at 08:45
@Karl: Morning Karl -- thanks for the commentary. Customers do change over time. There was a point through the late 80s where DEC continued to try to lock customers to their hardware base. If you put third party disks or memory on a VAX you voided your warranty and support agreement. It was just memory and disks -- nothing special. We did it anyway. We were tired of the lock-in, and tired of the ever increasing support costs and forced hardware upgrades. We were the customers -- it was our money. Then we started buying UNIX systems. You have to remember, twenty years ago UNIX was not the commercial workhorse it is today. It was less scalable, robust, and secure than VAX/VMS. But as customers we weren't going to take it any more. We could get UNIX systems from any number of vendors. Some of us even like DEC a lot. But that didn't mean we needed to give them money to support their stock price instead of solving our problems.
The cycle is repeating here. Customers that are less risk averse will "tough it out" longer. But once they start to see the migration path and that they're not exposing themselves or "going it alone" to lead, they'll start following the lead of others and start switching.
Posted by: stephe | 31 August 2007 at 09:00
"It buys them a few years of market ignorance at best."
Which translates into short-term money gains for shareholders. And that is all MS is interested in.
Posted by: Peter Rock | 31 August 2007 at 10:21
I'm always astonished by the measured, thoughtful responses to Microsoft's outrageous antics. I think people should be hopping up and down, collecting pitchforks and flaming torches, and preparing to storm Redmond. Fortunately, cooler heads prevail. (Or maybe not fortunately...) lying bully, Microsoft's spokepersons all claim that 'everyone uses dirty tactics.' No, they don't. Their scorched-earth take-no-prisoners policy is absurd and unnecessary. Or it would be, if they would devote a fraction of those sacking-the-village energies into developing excellent software and turning all of their lies into truth by removing their roadblocks (more like landmines) to interoperability and open standards. You don't have to be an OOXML expert to understand their real aims. Their tactics speak loudly and clearly. They think it's worth destroying a worldwide standards body to achieve their ends. That tells you all you need to know- they know their OOXML "standard" is a sham, and they have no intention of supporting a genuine standard. (And shame on the various committees for letting themselves be subverted.) I have no respect for Microsoft or its spokespeople. You have to leave your conscience locked up to parrot the sort of guff they're emitting. No paycheck is worth that.
Posted by: Anita Hanson | 31 August 2007 at 15:31
"lying bully" should read "Like any lying bully..." And I horked the formatting. Oh well, and thanks, Stephen, for a great blog.
Posted by: Anita Hanson | 31 August 2007 at 16:33