A Spec-tacular Failure

August 4, 2006

I've written before about the dubious value of functional specifications. If you want to experience the dubious value of specifications first hand, try writing a tool to read and write ID3 tags.

ID3 tags describe the metadata for an MP3 file, such as Artist, Album, Track, and so forth. ID3 tags certainly don't look all that complicated. Newer versions appear at the beginning of the MP3 file, and are nearly human readable even in a hex editor:

ID3 tag displayed in a hex editor

There's a set of comprehensive ID3 specifications to help us out. Unfortunately the ID3 specs are, in a word, bad.

Even with a bad spec, you can write code to parse ID3 tags. There are a number of CodeProject articles that read and write ID3 tags with varying levels of success. There's also a mature .NET ID3 library available, UltraID3Lib, but unfortunately it's closed source. It also suffers a little from explosion at the pattern factory design.

One of the first big warning signs is this list of ID3 "offenders" on the UltraID3Lib site. It reads like a who's who of music applications: iTunes, WinAmp, Windows Media Player. If the applications that ship with the operating system can't get ID3 tags right, clearly something is wrong.

And that something is the ID3 spec. How does it suck? Let me count the ways:

  • The spec shows how but rarely explains why. For example, frame sizes are stored as 4-byte "syncsafe integers" where the 8th bit of every byte is zeroed. Why would you store size in such an annoying, unintuitive format? Who knows; the spec doesn't explain. You just grit your teeth and do it.

  • The vast majority of the things described in the spec do not appear in any MP3 files that I can find or create. There are 70+ possible frame types, but I've only seen a dozen or so in practice. And what about encryption? Compression? CRC checks? Footers? Extended headers? Never seen 'em. And I probably never will. But I still have to parse through pages and pages of detailed text about these extremely rare features.

  • The spec has ridiculous enumerations. Check out the 147 possible values of the music genre byte. The existing 147 categories seem to be chosen completely at random. For example, "Negerpunk" (133), "Christian Rap" (61), and "Native US" (64). And evidently "Primus" (108) isn't just a band, they're a valid music genre, too. iTunes thankfully puts a stop to this madness by only displaying a fraction of these genres in its genre drop-down. And it isn't just the genre tag; one of the possible picture types for the attached picture tag "APIC" is-- and I swear I'm not making this up-- "A bright coloured fish" ($11). At some point you feel like you're wasting your time by enumerating insanity.

  • No examples are provided. Consider the comment frame. This is a relatively complex frame; it supports multiple languages and different encodings. It also supports multiple comments per frame with descriptive labels for each one. And yet it only merits a paragraph in the frames specification, with no examples of usage whatsoever. Would it kill them to provide a couple examples of how a comment should actually look?

  • Related items are not together. The comment frame has two lookups in its header: language and text encoding. There is absolutely no reference at all to these lookup tables in the comment frame description. You have to "just know" that the main ID3 spec defines all languages with three character ISO-639-2 language codes, and that there are four possible text encodings from 00 to 03, with different rules for null termination. It'd be awfully difficult to write a comment tag reader without this information, yet it's nowhere to be found in the description of the comment tag.

The ID3 spec is doubly frustrating because it makes a simple topic difficult. ID3 tags are just not that complicated. The spec makes me feel like an idiot for not being able to get this stuff right. What's the matter? Can't you read the spec?

No. I can't. And evidently, neither could the developers of WinAmp, iTunes, or Windows Media Player.

Since the ID3 spec is so deficient, I've been using the behavior of popular applications as a de-facto spec. In other words, I test to see how WinAmp behaves when editing ID3 tags:

WinAmp file info dialog

WinAmp isn't a model ID3 tag citizen. It ignores all comments except for the first one, and it adds garbage text as the language string for comments.

I also test to see how iTunes behaves when editing ID3 tags:

iTunes file info dialog

Although iTunes reads all versions of ID3 tags, it still writes ancient v2.2 ID3 tags to MP3 files, even in the latest version. So it's an especially poor role model for tagging.

Warts and all, the practical implementations of ID3 tags in popular applications like WinAmp and iTunes trump anything that's written in the formal ID3 spec. I finally understand what Linus Torvalds was complaining about:

A "spec" is close to useless. I have never seen a spec that was both big enough to be useful and accurate. And I have seen lots of total crap work that was based on specs. It's the single worst way to write software, because it by definition means that the software was written to match theory, not reality.

Specs, if they're well-written, can be useful. But they probably won't be. The best functional spec you'll ever have is the behavior of real applications.

Posted by Jeff Atwood
37 Comments

This is such a great point.

A similar situation arises with the TIFF specification for images. TIFF is extremely useful and well documented, and even has a great open-source reference library (libtiff) to illustrate implementation. But one still has to use Photoshop to verify compatibility and behavior :)

Ole Eichhorn on August 4, 2006 2:18 AM

The stupid thing is that your songs can only have one genre!

As of v2.4 this is no longer true. The frame for genre is like any other text frame and can have multiple values separated by NULs.

Michael Urman on August 4, 2006 3:23 AM

To be fair...:

frame sizes are stored as 4-byte "syncsafe
integers" where the 8th bit of every byte is
zeroed. Why would you store size in such an
annoying, unintuitive format? Who knows; the
spec doesn't explain.
So the ID3 tag can be inside an MP3 stream, and the media player can detect it's not part of the stream. I believe it does say so somewhere in the spec.

The vast majority of the things described in
the spec do not appear in any MP3 files that I
can find or create.
Well... true. Just implement those on an as-needed basis.

Check out the 147 possible values of the music
genre byte.
The music genre is an ID3v1 feature. ID3v2 replaces it by a genre string. Also, the list is so ridicilous because it has been made that way by Winamp (the original list from the ID3 spec was only 50 genres long or so). There you go with de facto standards...

Yes the spec is lacking in place but it's not as bad as you make it out to be.

RiX0R on August 4, 2006 3:35 AM

The music genre is an ID3v1 feature. ID3v2 replaces it by a genre string.

It's not replaced, technically. The genre ID is embedded in the string within parens.. ghettotastic!

The length of the original ID3v1 genre list is 79. It's less crazy, but still includes "Christian Rap" and "Native US". It's in Appendix A here:

http://www.id3.org/id3v2.4.0-frames.txt

It's really too bad that genre has become so fractured (either use a freeform string, or one of the 179 crazy-ass predefined genres), because categorizing music by genre is definitely useful. I could see myself listening to a mix of "classic rock", "funk", or "alternative" songs.

Jeff Atwood on August 4, 2006 3:47 AM

The stupid thing is that your songs can only have one genre!

Anyway… You make a good point. One thing I like about languages like PHP Perl, is that they basically only have one implementation, and however it behaves is implicitly all that matters no matter if there is a spec or not. Even C# counts here, although there is a spec and Mono has its own compiler, the MS implementation is the de facto standard. C++ has an amazing spec that no compiler actually implements completely, it's painful!

Chris L on August 4, 2006 4:01 AM

If you really want to parse and write ID3 tags The Right Way, you might want to have a look at Mutagen, a FLOSS Python library that does the job (pretty well, and respecting the spec). It lives at a href="http://sacredchao.net/quodlibet/wiki/Development/Mutagen"http://sacredchao.net/quodlibet/wiki/Development/Mutagen/a

alextp on August 4, 2006 5:16 AM

I recently had to implement some sort of ID3Tag reader, too. V1 is simple enough of course, but after browsing through the spec of V2 for a couple of hours I very quickly searched for some alternative - and found the Windows Media SDK.

You can use the WM SDK relatively simple with a few COM wrapper classes in .NET and you're able to read and write not only ID3 Metadata but every other Metdata which is supported by WM, including of course wma/wmv.

MS even went so far and put a ready-to-use wrapper inside the WM SDK, you can find it in:
WMSDK\WMFSDK95\samples\managed\wrapper
WMSDK is supposed to be the installation folder of the SDK.

One (minor) drawback is that when writing ID3Tags the WM SDK simply writes both versions, V1 and V2 - you can't choose.

qma on August 4, 2006 5:51 AM

Hello Jeff,

I couldn't disagree more with your statement that the best functionnal spec is the behavior of a running application. I sounds tempting, but don't let the idea fool you.

I have had this case on a previous project of mine: Migration and redesign of such an application from one technology to another. I can promise you, after a few years, without an up to date spec, NOBODY knows anymore what an application does or is supposed to do.
The truth is that in the real world not all development happens the way it should, and neither do all developers that worked on a system have the quality standard that they should. So over time many "features" (= special cases added quickly = mysterious if statements) creep in, and once the person who developed them leaves, so does the last bit of knowledge about them.

A good, kept up to date spec really saves the day. You have to get up to speed with this 100000 line program? Read the code. Eh no. A 200-page document providing a high-level overview of architecture, gui mockups and usecases is a million times more efficient.

Granted, for it to be effective, you have to keep it up to date. But when you have the displine to do so, it mores than pays off !

And let's not even talk about the time you save designing a new app, when changing an entire subsystem just means rewriting a paragraph versus rewriting the code you banged out so quickly ( because code is the best spec of course hum hum)

Axel

Axel on August 4, 2006 6:16 AM

I agree with Axel, although I wonder Linus is talking about enterprise apps vs ... well dont know what to call it - hardware apps? Can you imagine writing a spec for Linux? That's got to be insane compared to an Purchase Order or Banking app.

But I also agree with Linus as well when he says "I have never seen a spec that was both big enough to be useful and accurate." This is the bane of specs and why people hate them - because writing them upfront and keeping them updated. It takes lots of discipline for an individual AND a company to maintain.

I build enterprise apps though, and I totally agree that specs are worth it in the long run

Morgan on August 4, 2006 6:53 AM

I think Linus's statement is true for larger projects, like any nontrivial software product, where it is really difficult to nail down all the dimensions and keep them updated. But for something relatively small and simple like ID3, a clear and well-written spec. shouldn't be too much to ask for.

If you think about it, whoever wrote the spec (whether that's one, or more, people) has an idea of what ID3 is and how it should work. Their responsibility is to communicate that in the clearest way possible if ID3 is to become a well-implemented open standard everywhere. This means listening to programmers and helpfully adjusting to make things simpler. With several problems now identified in the spec, here's hoping someone makes the necessary fixes.

John on August 4, 2006 7:08 AM

Absolutely the best lib for tag reading is taglib. It's open source, accurate and fast. Used by many linux players.

Max Howell on August 4, 2006 7:43 AM

"...one of the possible picture types for the attached picture tag "APIC" is-- and I swear I'm not making this up-- "A bright coloured fish" ($11)..."

Oh! Oh! Mr. Kotter! Oh! I know this one!!!

It's a Red Herring: http://en.wikipedia.org/wiki/Red_herring_(plot_device)

Reg Braithwaite on August 4, 2006 8:54 AM

Indeed the ID3 spec sucks. I have thousands of MP3s, and I decided long ago that the Genre field is absolutely useless. Exactly how do I tag the "War of the Worlds" radio play? In version 1, the only appropriate genre value is null I guess; in version 2, the best value would be "speech". The spec authors seem to assume that you would only ever make audio recordings of music and nothing else!

Brendan Kidwell on August 4, 2006 9:41 AM

I realize my comment was sorta off-topic, but it's been a pet peeve of mine that the spec has so many "genres" in it and most of them are useless to me.

Brendan Kidwell on August 4, 2006 9:49 AM

I can promise you, after a few years, without an up to date spec, NOBODY knows anymore what an application does or is supposed to do.

A good, kept up to date spec really saves the day. You have to get up to speed with this 100000 line program? Read the code. Eh no. A 200-page document providing a high-level overview of architecture, gui mockups and usecases is a million times more efficient.

Ug. Yeah, these may be true, but if you're doing something as low-level as converting an app from one language to another, you *must* look at the source code. There's no other alternative. Only the code represents the true state of affairs. Everything else is merely an abstraction.

foobar on August 4, 2006 10:09 AM

So MP3, Zip, AES, MD5, VST, TCP/IP, RAID, PostScript, ClearType, and Kerberos are all useless technical specs, right?

Judging from all the crappy, terrible, utterly hopeless implementations of these specs, yes.

I recently had the fortune to use SharpZipLib. The actual code to create and populate a zip file was easy. The amount of crap I had to do to accomodate such hapless programs such as WinZip was painful.

foobar on August 4, 2006 10:34 AM

Where do "we" weigh in with specs like those maintained by W3C? The ECMAScript spec? PNG?

I think that the issues you raise with the ID3 spec are manifold. One is that the spec is indeed confusingly written. (I had the same reaction you did -- "Am I just stupid, or what?" -- although in my case that's probably more of a contributing factor than in yours, haha.) Another is that the layout of ID3 tags themselves clearly is the result of an iterative process that has resulted in a, er, less than ideal design. And then as you note, spec or not, vendors have been implementing ID3 tagging with varying levels of thoroughness, which can at least theoretically discourage someone from fully implementing the spec -- if (e.g.) iTunes won't (or can't) even use all the information in a completely correct ID3 tag, then what's the point of dotting all the i's and crossing all the t's? (I bet you a nickel that if you talked to the people who implemented ID3 tagging at those vendors, they'd say they just bashed on because a) the spec was too confusing or difficult to implement and b) dang, they had to get a product out, like, now.)

I think Torvald's comment is either taken out of context or should be discounted as the ruminations of a guy who works in a particular way on a particular set of problems. I doubt that even he would say "... and so we should never have specs for anything," coz man, I doubt you'd get much traction with a development approach like that.

Specs serve other purposes besides simply being a reference document where someone can look up something up down the road. The specification process itself is a way of thinking through a problem and hopefully heading off at least some of the dead branches in a development process. A formal spec done by, say, W3C can be useful for incorporating the thoughts of a variety of people who might need to use the spec. (Clearly the genre portion of the ID3 spec did NOT enjoy the benefits of an open-comment period.)

The process of creating and maintaing a spec, as well as the mechanical process of writing it, are as subject to the usual distribution of talent as any other enterprise. Some people and institutions are really good at it; most are ok; some really suck.

And dang, no examples? Grade: F.

mike on August 4, 2006 10:47 AM

The layout of ID3 tags themselves clearly is the result of an iterative process that has resulted in a, er, less than ideal design.

Forgot to mention that it's something of a truism in my job (writing) that if a design is difficult to document (i.e. describe), maybe, just maybe, it's not such a good design. Not always; some problems are inherently hard. But for something like ID3 tags, c'mon.


mike on August 4, 2006 10:50 AM

I agree that specs generally end up missing the mark by a few hundred miles, but I often disagree on 'why' this is.

Why do specs fail? I think that most often, it's because the spec writer is too focused on explaining 'what' something is, and not focused enough on 'how' general users can get their task completed. failing to include examples and use-scenarios are very telling symptoms of this failure.

If a non-english speaker asked me how to learn to speak english and I plopped a 3000 page websters dictionary on his lap, I'm probably not helping that much. But this is effectively what happens to developers when they are handed the w3c spec for SOAP and told to go develop a web-service, and It sounds like a similiar situation with the id3 spec.

brad on August 4, 2006 11:08 AM

Aaron, I'm glad you brought up the issue of Functional vs. Technical specifications. Here's a clarifying blurb by Joel Spolsky on that:

http://discuss.fogcreek.com/askjoel/default.asp?cmd=showixPost=4202ixReplies=8
--
My way of thinking is that you just don't write "technical specs" that cover the entire functionality of an application ... that's what the functional spec is for. Once you have a complete functional spec the only thing left to document is points of internal architecture and specific algorithms that are (a) entirely invisible (under the covers) and (b) not obvious enough from the functional spec.
--

Jeff Atwood on August 4, 2006 12:25 PM

Note only is the ID3 spec bad, the information in most songs isn't terribly useful.

Eric on August 5, 2006 9:57 AM

Hurah for The Dead Milkmen! Damn hard music to find nowadays.

deadscott on August 6, 2006 4:41 AM

Husker Du is great though.

Per on August 6, 2006 12:26 PM

As someone who has implemented ID3v1 and ID3v2 parsers for a commercial embedded mp3 player, I have to completely agree. ID3v2 is completely stupid. First of all, the sync-safe integers are useless when frame contents don't need to be sync-safe. Basically, every player needs to figure out how to skip the header. That breaks the whole point of using a sync-safe tag.

Second, was there really a problem with having the tag at the end of the file? Editing the tags (updating them, whatever), is much simpler if the tag is added at the end of the file than inserted at the beginning. Yes, there is a provision for leaving padding space when you re-write the whole file to insert the new tag. However, lots of apps don't do it. And if they do, it's a much bigger waste of space than storing 2-byte unicode for strings.

I'm sure ID3v2 was designed by someone who had good intentions. While it does work, it could have been much simpler if they had dropped the sync-safe thing and put the tags at the end of the files. Retagging thousands of files would have been a much faster operation.

Cryptnotic on August 6, 2006 12:43 PM

Specs are useful, but *only if there is also a reference implementation*. That way where the spec doesn't explain it very well you can check with the reference code what should be happening.

It also means that the spec writers have to keep themselves grounded in reality as they also have to implement it themselves.

Needless to say, the ID3v2 freaks never wrote one. They certainly never write one which had to trry and parse all of the incompatible versions they trotted out

Tom on August 7, 2006 11:24 AM

The reason for putting ID3v2 at the beginning of the MP3 is to support streaming media.

As I understood it the idea was to make the entire ID3v2 header sync safe so that players which didn't understand ID3v2 at all would just coast through it looking for the sync pattern. It took them until v2.4 to actually achieve this (I think) - before this version the block headers could look like the sync pattern.

Tom on August 7, 2006 11:27 AM

and put the tags at the end of the files

This is one of the very few things in the ID3v2 spec that did make sense to me. Putting the tags at the beginning does make it faster to read them; you only need to read in the first few hundred bytes to know the name, artist, etc of the recording.

It's much better when you're streaming a file over a slow connection, too, but the same economies of scale apply to every device.

Writing beginning tags is of course painful, but I think the tradeoff between fast read / slow write is the correct way to go. You'll be reading 99.9% of the time anyway so why not optimize for that case?

Jeff Atwood on August 7, 2006 12:47 PM

Once upon a time specs were developed by engineers and they were complete and clear. Then specs became holy and many people started writing specs. And lo, specs became political.

Specs generally have no explanation for the why of anything, which is really what you need. With the proliferation of video we may have a solution. Set up a video camera in front of a white board and encourage developers to expound on what they have written and why they did it that way. If you have some people who don't participate, hold an interview in front of the camera. Walk through the code with them. Put all the recodings/interviews on a server. I know it may take more time to search all the videos for the info you need, but at least you will have a chance of finding it, rather than having to try figure why they did something from the code, which can be near impossible.

Charles Pergiel on October 7, 2006 11:30 AM

I generally think the spec is ok. I've written code to implement all the common tags, just copying any unimplemented (in my app) tags as is.

The biggest problem, though, is the difference between V2.2, V2.3, and V2.4 specs.

To call the difference between 2.2 and 2.3 a minor revision was just crazy. V2.3 should have been V3.0. Try making an base class for V2 frames that can be extended to either V2.2 or V2.3. It just isn't practical. The headers are different; the frames are different; the names are different. The only similarity is ... hmmmmm.. well, I can't think of any similarities right now.

While you might call the difference between V2.3 and V2.4 a minor revision, even that is not so clear.

Dale on November 5, 2006 8:38 AM

One more point about the idea of using current implementations as the spec rather than the spec as the spec.

I wrote my ID3 library and media manager because of the flaws and usability issues with Media Player and iTunes. Neither did a great job - both were good in some areas and terrible in others.

But I needed my MP3 files to play in my car and in my truck, to play in my home stereo, in my Media Center, on several PC's using iTunes or Media Player (depending on the preference of various family members), my PocketPC, and so on. The only way to accomplish that is to follow the standard. My own player follows the standard and uses all the capabilities of my ID3 library; others that don't use the standard are limited but that is their own limitation, not a limitation of my software so it doesn't cause us any grief.

Dale on November 5, 2006 8:45 AM

I completely agree. I'm writing a tag editor in C# and this spec is an abortion. The unsync scheme is one of the many poor choices made IMO. A better solution would be as follows:

If a tag is present, the first 3 bytes of the file will be 'ID3'.

The next 4 bytes represent the integer size, in bytes, of the entire tag(nSize). This makes it very simple for decoders to start scanning for sync bits at offset 'nSize'. No more bit shifting nightmares for tag editors.

Andrew on December 22, 2006 1:34 AM

Hello. My name is Mitch Honnert and I’m the author of UltraID3Lib, the library mentioned in the blog entry. I know I’m late to the discussion, but I thought I’d add my two cents…

I consider myself to be a supporter of the ID3 format, but other than a few quibbles, I’d have to say I agree with most of your criticisms. I noticed, however, that your comments focus mostly on the *documentation* of the ID3 format rather the format itself. While the documentation has its flaws, I think the format itself is rather good. The ID3 spec may be overkill for the vast majority of tag users, but the design allows for an incremental implementation that leaves the choice of how deep to go into the format up to the developer.

So, yes, there are some obvious problems with the specification documentation. In fact, your blog entry has inspired me to lobby for incremental updates to the existing specs. My hope is not to change the specification in any material way, but just to rewrite some of the documentation in order to avoid the problems associated with ambiguous standards.

- Mitchell S. Honnert

Mitchell S. Honnert on December 25, 2006 7:17 AM

Tnis might sound like a little, nit-picky point, but the ID3 spec does not make it clear whether the 'length' field in the ID3 header should include the header itself. My assumption has always been that it does not, and Windows Media Player seems to agree with me, but I have just been handed some files where it does and these break my code :( So who knows? - I am still trying to figure it out. If anybody has a definitive answer I'd be pleased to hear it, but I don't think there is one. Opinions seem to vary.

But this whole sorry little tale demonstrates something about the art of writing specifications. It is important to pay attention to detail, and the inexperienced often don't.

Paul Sanders
a href="http://www.alpinesoft.co.uk"http://www.alpinesoft.co.uk/a

Paul Sanders on May 11, 2008 3:46 AM

This might sound like a little, nit-picky point, but the ID3 spec does not make it clear whether the 'length' field in the ID3 header should include the header itself. My assumption has always been that it does not, and Windows Media Player seems to agree with me, but I have just been handed some files where it does and these break my code :( So who knows? - I am still trying to figure it out. If anybody has a definitive answer I'd be pleased to hear it, but I don't think there is one. Opinions seem to vary.

But this whole sorry little tale demonstrates something about the art of writing specifications. It is important to pay attention to detail, and the inexperienced often don't.

Paul Sanders
http://www.alpinesoft.co.uk

Paul Sanders on August 8, 2008 4:24 AM

a bit late, but interesting discussion, nevertheless. i've noticed the 'length' field is under-specified as well. i've mailed to author of the spec - AFAIK without result (and certainly without any response).

but anyway, i think it's not all that bad. it takes some time to understand the spec, but at least all info can be found on www.id3.org. i've been developing kind of "music / tags validator" in sense of "w3c html validator" - and i found out mp3's arround the net are just disaster. tags and various rubbish is found in the middle of the file (even id3v1), ape tags are used for mp3, last frames are truncated, some encoders even calculate mp3 header CRC the wrong way, so all frames in the whole file seems damaged :) it's lols ... i guess it's caused by the number of mp3 encoders (both hw and sw) written ...

on the other hand, i love Ogg Vorbis. tags are simple and neat ... and in 200GB of music, i have no single broken Ogg. of course, Ogg could learn from mp3's mistakes.

swajnaut on June 29, 2009 6:00 AM

So MP3, Zip, AES, MD5, VST, TCP/IP, RAID, PostScript, ClearType, and Kerberos are all useless technical specs, right?

Let's not confuse the quality of a particular spec here with the usefulness of specs in general.

And functional specs are completely different from technical specs. Functional specs are merely "cover your ass" documents when working on an important project. Whether or not they're well-written, well-read, or help at all in making the finished product, isn't the point - they're merely documented proof that you delivered what you promised. Good business sense is requisite in any profession, software included, and having a spec is simply good business sense, even if it's nothing more than a project hangnail.

Aaron G on February 6, 2010 9:51 PM

Does this post mean you prefer the old days where browsers had to be lenient in parsing HTML because all the other browsers had bugs? Having an actual HTML spec to follow is bad?

Timwi on May 6, 2010 9:52 AM

The comments to this entry are closed.