Are You an XML Bozo?

July 28, 2006

Here's a helpful article that documents some common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?

Bozo the clownThere seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.

  1. Don't think of XML as a text format
  2. Don't use text-based templates
  3. Don't print
  4. Use an isolated serializer
  5. Use a tree or a stack (or an XML parser)
  6. Don't try to manage namespace declarations manually
  7. Use unescaped Unicode strings in memory
  8. Use UTF-8 (or UTF-16) for output
  9. Use NFC
  10. Don't expect software to look inside comments
  11. Don't rely on external entities on the Web
  12. Don't bother with CDATA sections
  13. Don't bother with escaping non-ASCII
  14. Avoid adding pretty-printing white space in character data
  15. Don't use text/xml
  16. Use XML 1.0
  17. Test with astral characters
  18. Test with forbidden control characters
  19. Test with broken UTF-*

I'm a little ambivalent about XML, largely due to what John Lam calls "The Angle Bracket Tax". I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath-- it's sort of like SQL with automatic, joinless parent-child relationships.

But XML is generally the least of all available evils, and if you're going to use it, you might as well follow the rules.

Posted by Jeff Atwood
12 Comments

I only have occasional need to deal with XML at present so might well be an unwitting Bozo. But many of these rules expressed as Don'ts leave questions begging. For example, #5 if you don't use an XML parser what do you use?

Nick on July 30, 2006 12:52 PM

I on the other hand love xslt :). I've yet to run into a problem that requires an impractical solution. And with grouping, regexp and all the goodies of xpath 2.0 it's even easier to use.
I have rule nr. 1 taped on the wall behind my desk. Whenever someone comes in with an xml-related issue I simply point to the poster. This is usually all it takes :).

nonDev on July 31, 2006 3:06 AM

Talking of bozos, #17: Can we find the person who came up with the term "astral plane" and beat them to death with their own dungeons and dragons books? Please?

Anony Moose on July 31, 2006 4:03 AM

If you need that many rules to get your document format right, you might want to think about a different format.

David Avraamides on July 31, 2006 4:14 AM

XSLT haha!

When I first joined this organisation that used an XML database called Tamino that then used XSL files to create webpages, along with the help of some Java.

The whole system was massive, complex and bloody slow.

I re-developed the whole thing using SQL Server 2000 and asp.net pages. It uses a fraction of the size, runs much faster and its very easy to make changes, unlike the XSL system :yuck:

Peter Bridger on July 31, 2006 5:59 AM

Jeff, it should be noted that using XSLT will (prettymuch) guarantee that your output will be conformant to all of those rules for generating XML.

So although on one hand you say "XSLT is insane", on the other hand this entire post seems to be an argument in favour of it.

Alastair on July 31, 2006 6:27 AM

I would like to add a question to that list - Does this problem really require XML ? (Think Ant).

Vineet on July 31, 2006 12:14 PM

Here's a good list of things to consider when writing XML:

a href="http://recycledknowledge.blogspot.com/2006/03/writing-out-xml.html"http://recycledknowledge.blogspot.com/2006/03/writing-out-xml.html/a

And when converting HTML to XHTML

a href="http://recycledknowledge.blogspot.com/2006/03/how-to-write-xhtml-even-if-you-dont.html"http://recycledknowledge.blogspot.com/2006/03/how-to-write-xhtml-even-if-you-dont.html/a

Jeff Atwood on August 29, 2006 10:31 AM

you obviously are not a good xslt programmer.

rx on May 24, 2008 2:20 AM

I would like to add a question to that list - Does this problem really require XML
http://biglider.ru/

Olef on January 9, 2009 5:48 AM

And of course he recommends... the serializer/XmlWriter! Yes, let's all write at least 3 lines of code for every element, more if there are attributes!

I don't have a problem with XML, but the notion that it's perfectly okay to expect developers to write 500 lines of code comprising 46 routines and 13 classes just to spawn a single document sounds characteristic of an Architecture Astronaut.

Maybe text-based templates aren't the answer either, but you can use a single routine to escape a full XML string without the ridiculous overhead of a "writer". IMO, in order for XML to really be productive for developers, the dev tools either have to serialize it automatically (.NET Web Services), or allow it to be written "natively" (Ruby / XLinq). Without simplified support, I'd have to ask if the same problem could be solved with plain-text/CSV or an RDBMS.

Aaron G on February 6, 2010 9:51 PM

Liquid XML Studio is a free XML Editor and graphical schema editor for windows, it provides 'Well Formed' checking and validation against external XML Schemas. It also has an XSLT editor which can execute the transform and show the results.
http://www.liquid-technologies.com/XmlStudio/Free-Xml-Editor.aspx

Simon Sprott on February 6, 2010 9:51 PM

The comments to this entry are closed.