Making a file format standard is hard work

¶Making a file format standard is hard work

There has been a lot of discussion lately over Microsoft's Open Office XML (OOXML) format and how it has been going through the ISO standardization process. Now, I'm not in the business of writing productivity software nor do I have any interest in doing so, but purely from a technical standpoint -- political issues aside -- I'd have to agree with detractors that OOXML is not a good standard. Underspecification bothers me the most. Not adequately specifying part of a standard results in ambiguity that can kill the utility of parts of a file format; once everyone starts making different errors in writing the same field, it can become impossible to discern the correct meaning. Having a tag such as "useWord97LineBreak" in a standard without an actual description of what Word 97 does is, of course, an egregious offense. However, I will say that trying to fully specify a file format isn't easy, and OOXML definitely wouldn't have been the first ISO standard that suffers from holes.

The reason is that writing a file format is, well, hard.

Let's take the simple example of storing a frame rate in a file. Because common frame rates are non-integral, and we want to maintain accuracy even over long durations, we'll store it in rational form, as the ratio of two 32-bit integers:

uint32_t frameRateNumerator;
uint32_t frameRateDenominator;

(This is, in fact, how AVI stores frame rates. It is also used with Direct3D 10.)

How many issues can arise with these two fields? Well:

Are there minimums and maximums to the stored fraction? Are there certain profiles that can rely on restricted values, such as for mobile devices?
Are there recommended values for common frame rates? (These can double as compliance tests.)
Can the numerator be zero? This would mean a frame rate of zero.
Can the denominator be zero? You can't divide by zero. What does it mean?
If zero in either field is invalid, what should programs do? Should they reject it, automatically correct it to some value, or is it up to the implementation?
What is the byte order of these fields, little-endian or big-endian?
Must the stored fraction be stored in lowest terms? Is there any significance if they are not, and should an implementation reduce an unnormalized fraction? What algorithm is recommended for reducing fractions? (Finding one was a bit harder when you had to go to the library instead of doing a web search.)
If an application approximates the fraction to a single value, what is the minimum recommended or required precision? Are there specific values that must always be represented exactly?
Do these fields need to be consistent with other fields in the file? For instance, are there times when the same frame rate shows up multiple times in the file? If they are different, how are they reconciled?

There are a number of bad outcomes that can arise from not answering these questions. One possibility is that applications commonly write 30/1 for NTSC and then interpret that on read as NTSC, even though NTSC is actually 29.97. Another possibility is that an application writes garbage into the frame rate fields and then ignores the values on read, because it works in a medium that already has a defined frame rate and not all programs validate or use the value on read. A third possibility is that everyone assumes the order is backwards and the odd program written by the person who actually reads the spec can't read everyone else's files. And yes, I've seen all of these kinds of mischief before.

Good file formats are rare, but in my opinion, the Portable Network Graphics (PNG) specification is among the better ones. It uses clear language (must/must not/should/should not), it has rationales for various design decisions, and it attempts to advise what to do when dealing with non-compliance. For instance, when talking about converting samples between different bit depths, it describes the best case (linear mapping), an acceptable approximation (bit replication), and says what you should not do and why (bit shift left). That level of detail doesn't prevent all accidents, but at least it reduces them through awareness, and clarifies who is at fault when an interoperability problem occurs.

9 comments | Sep 10, 2007 at 23:56 | default

Current version

Navigation

Archives

¶Making a file format standard is hard work

Comments