I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

Jun 23, 2008

Revisiting the XML Angle Bracket Tax

Occasionally I'll write about things that I find sort of mildly, vaguely thought provoking, and somehow that writing turns out to be ragingly controversial once posted here. Case in point, XML: The Angle Bracket Tax. I'm still encountering people online who almost literally hate my guts because I wrote that post. You'd think I kicked their dog, or made inappropriate romantic overtures toward their significant other.

Well, first of all, we are talking about XML the markup language, not XML the religion, right?

I hope so. I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it. This doesn't mean I can't be enthusiastic or critical of those tools and technologies, but I'm not married to the stuff either way. Who needs all the emotional baggage?

Obviously I failed to communicate this before. I talked about this a little bit on Stack Overflow podcast #5 with Joel, where I tried to amplify and explain my position a little better.

I wasn't trying to present it as "Oh, XML is bad, let's all switch to this new markup language that all the cool guys are using". What I was trying to say is why don't we think about what we're doing? That's the general theme of a lot of the stuff in my blog. Can we just stop programming for a minute to think about what we're doing and not make a blind choice based on "Well this is what my tool does, so that's what I have to do"?

I think obviously there's pros and cons to each. I'm not saying that one is the right solution all the time. But I think, ironically, that is what is happening with XML. I think people are saying "It's always the right answer, because it can store anything, right? And all the stuff I use uses it, so it must be the right choice for everything." That bothers me a little. Maybe I'm just contrarian. Maybe I'm an iconoclast and I want to try different things and see different things, but I think actually understanding the alternatives helps you understand XML better, a little bit, too.

And I hope people reading my blog would not get the idea that it's about a knee-jerk reaction one way or the other. It's about understanding the tradeoffs and applying those tradeoffs to your particular situation. I think that is the absolute art of programming. It's understanding what you could do, and which one of those things fits your situation best. Versus what so many programmers do, which is "I've learned to use a hammer, and I'm gonna hammer everything." Ultimately, to me, it's about self-awareness.

By the way, I'd like to thank everyone who pitches in to make those Stack Overflow podcast transcriptions possible. It is because of your generously donated time that I am able to quote that audio here.

I don't post stuff to push people's buttons, I post it because I want programmers to think about their tools, their technologies, their methods.

Think IBM placards, taken at the Computer History Museum

If what I post here seems unnecessarily confrontational sometimes, a far smarter person than myself said it better than I can:

I blog to help others and also to learn. As it turns out both are aided by getting folks to actually read the stuff. Please pardon the necessary devices.

Please do pardon the necessary devices; I find that I often learn best through the smackdown learning model. That works for me. Maybe it doesn't work for you, and that's OK. There are millions of websites to choose from.

That said, I do actually have a problem with XML, or I wouldn't have written anything in the first place. I think there's a real issue here that is, for the most part, being completely ignored. XML fever may not be as debilitating as, say, Dengue fever, but it has side effects as well.

Consider Norman Walsh's Defending the Tax. Norman is an XML Standards Architect at Sun.

On the other hand, the difference between:

fruit=pear
vegetable=carrot
topping=wax

and

<doc>
<fruit>pear</fruit>
<vegetable>carrot</vegetable>
<topping>wax</topping>
</doc>

isn't really that large, is it? (Or maybe you think it is, de gustibus non est disputandum.)

The de gustibus dismissal means Norman considers it is a matter of taste, but it isn't. The difference is large. There is a very real mental cost to parsing even a few short lines of XML.

As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project. Every time I look at my web.config XML file, there's a mental cost of me having to parse all these tags in the file. Here's this tag, which lines up with this tag. Here's this giant, verbose thing where only half of it actually matters.

Sure, it's a small effort. Insignificant, even. But what's the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?

I also posit that these minor headaches may be more significant than you realize. In Stumbling on Happiness, author Dan Gilbert makes a similar assertion.

Stumbling on Happiness

His research found that people are bad at predicting their own future happiness. They tend to radically overestimate the positive or negative impact of large events in their lives -- losing your job, getting rich, getting divorced, having children. That's generally good; it means we have defense mechanisms in place to adapt and survive in our changing circumstances as human beings. But, we also tend to radically underestimate the impact of the dozens of small events in our lives throughout the day. Thus, small injustices don't trigger our defenses. The effect of that squeaky screen door, the neighbor's barking dog, the interrupting telephone call -- all of these may have far more profound cumulative impact on your day to day happiness than you realize.

It's a fascinating book, and I'm only paraphrasing the smallest part of it. I highly recommend reading it if this is at all interesting to you. It won't exactly unlock the secrets to happiness, I'm afraid, but you may gain a deeper understanding of why we tend to make the choices we do in our neverending pursuit of happiness.

I'm not trying to change the world overnight, but I wouldn't mind planting a few seeds of dissent in people's minds. This small stuff matters.

The next time you're trying to figure out an XML file, just think about it.

That's all I'm saying.

Posted by Jeff Atwood    View blog reactions
« The Ultimate Code Kata
Smart Enough Not To Build This Website »
Comments

Hi Jeff,

I presume you're using Web Forms for your views on Stack Overflow? Was this a considered decision? Did you consider any of the alternatives such as NHaml?

Cheers,

Andrew.

Andrew Peters on June 24, 2008 6:54 AM

One thing I really like about your blog is your quotes on other people. I find those quotes to usually been the most reliable part of your posts, since not just you agrees with the guy you quote makes the information more reliable. Quotes are for highlighting points about a topic not about placing parts of other people (or even yourself in a previous publication) text into your own. That said I think you should have taken that part of your podcast rewrite it and place as normal text.

About the topic I find that it mostly does not matter, even more, I think it's better this way because it's standardized. Standards are good, even if they are bad. A stantard is better than each one having it's own language and it's own parser and so on.

Hoffmann on June 24, 2008 6:55 AM

Elaborating a bit on the Dan Gilbert book:

"The Futile Pursuit of Happiness"
http://www.wjh.harvard.edu/~dtg/Futile_Pursuit.htm

Jeff Atwood on June 24, 2008 7:15 AM

I hate to read XML, but, as Hoffman said, its standardized and almost any dev CAN read it. While it may benifit you to use something else, it will hurt the people who have to read that later.

I do agree that you need to be sane about what you use, but when its only a small difference I say take a hit for the rest of the community.

The other thing XML has going for it is LINQ-to-XML and XML literals in vb.net. There isn't a much easier way to write/parse data in .net.

Eric Haskins on June 24, 2008 7:19 AM

I'm actually using this summer to teach myself more about XML and what it can actually do for me past a few simple scripts for websites. Can anyone recommend me a good book for learning XML?

Mike on June 24, 2008 7:38 AM

Hi Jeff. I've recently defended your blog to a fellow programmer who falls into your "passionate hatred" camp, and he's not the only guy I know who thinks similarly. On the other hand, I know a lot of people like myself that recognise you're just another bod. Personally what draws me here isn't the technical excellence, but the fact you can repeatably string consistent articles together that can be read by the average joe programmer (me).

I think you're overstating the mental parsing problem just slightly, and would almost dare to posit that if you can't substitute "fruit=foo" with "fruitfoo/fruit" after a few years' of Visual Studio, then you're probably in the wrong business. ;)

Another aspect is that this could be seen as a tools problem. For the past year or so I've reached the opinion that for any formally structured data, there almost certainly exists a more efficient, "humane" representation that should be implemented in a GUI for manipulating that data structure.

While there aren't such things around (yet) for things like C or C# code, there exist quite a few XML editors that implement a number of different graphical interfaces to viewing/editing XML. The beauty in the generality of XML is that a user/programmer is free to pick from any number of different representations that he may use to manipulate the Infoset. The textual tags are just one widely used representation.

David W on June 24, 2008 7:42 AM

I'm going to share with you the first paragraph of Simon St. Laurent's 1998 "Why XML" article:

"The computing press has found a new savior for the ills that afflict computing and the web: XML. XML is new, it's exciting, and it's got to be good, because the specification for it looks indecipherable. XML's hype level has already drawn fire from some quarters, from those accusing it of 'balkanizing the web' or of increasing the load on an already strained Internet."

10 years already. It does seem indecipherable at times (especially when you're dealing with large XML content).

Here's the link: http://www.simonstl.com/articles/whyxml.htm

Elvis Montero on June 24, 2008 7:49 AM

While xml is horrible to read and isn't going to make anyone happy, using anything else is likely to make someone seriously unhappy. Have you ever tried parsing a csv or similar proprietary file which has documentation that not only was lost years ago but didn't handle the data type you are trying to add anyway?

The mental cost of reading xml is far outweighed by the benefit of being sure that you will be able to read it. Definitely a case of worse being better.

Tom Clarkson on June 24, 2008 7:50 AM

You should go one step further and prove your tax. Write a program using Visual Studio that yields the same result with each dataset. Wouldn't it be safe to say that if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?

Tim on June 24, 2008 7:51 AM

XML is generally excise. (Doesn't About Face have an entire chapter on this?) When XML is presented to humans as the main means of modifying data or software state, you should be using something different (i.e., an actual UI). That said our content management system wouldn't exist without XML and XSLT, and I love both very much. Our users are none-the-wiser, however. XML is the pain that software developers bear so our users may lead happier, healthier lives.

Kendrick Erickson on June 24, 2008 8:00 AM

Once again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable - the verbosity that is so annoying to a human brain (because we can interpret the meaning from context) is absolutely essential for software. Thus, I think the solution would be a translation layer for human viewing/editing of XML files. I'm sure that XML viewers/editors already exist (a quick Google search shows that they do). Maybe you should give one a shot? If you can get a plugin for Visual Studio, the entire problem would be solved.

Bill Gates on June 24, 2008 8:02 AM

My problem with XML isn't as much the strain of having to read it; it's more of how bloated it has become.

If I recall correctly, XML was derived from XHTML, which has it's basis in HTML. So, in theory, XML is really just another text markup language. I'm not going to argue with the ability to create your own markup tags that can be parsed to mean whatever you want them to - quite the opposite, in fact. That feature is (hands-down) the most powerful aspect of XML.

Unfortunately, when you give people that much power, it inevitably goes downhill. Think of what XML was intended for (custom text markup), and now think about what it is being used for nowadays (configuration files, data transmission, data persistence, reporting, etc.) How much of the "usefulness" of XML is due to the ability to throw whatever you want into a file along with the rest of a loose collection of information, which might not even be relevant?

This doesn't even begin to take into account the extra overhead associated with parsing, reading and writing the information as you said in you previous post. Add that into the mix, and (to me, anyways) the case against the widespread proliferation of XML grows stronger with each opening and closing tag.

So the question I pose is this: is the advent of XML as a universal data type (for lack of better wording) making us better programmers, or is it causing us to slide backwards into the olden days of placing everything having to do with anything into one place for "easier" access?

Jimmy on June 24, 2008 8:07 AM

you must love wpf ;)

brian on June 24, 2008 8:16 AM

Hi Jeff. I've recently defended your blog to a fellow programmer who falls into your "passionate hatred" camp, and he's not the only guy I know who thinks similarly.

"The dogs bark: a sign that we're riding, Sancho". (Don Quixote, via Jorge Diaz Tambley)

Once again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable

I desperately wish someone would explain this to all the people writing XML files. Oh wait, we have.

Wouldn't it be safe to say that if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?

The idea that there's only two choices: XML or "write it all yourself" is sort of.. a lie.

YAML is based on a standard, too:
http://www.yaml.org/

Jeff Atwood on June 24, 2008 8:25 AM

The power of software development is that it is one of the most efficient methods of expressing our will. Once it was people being taught a process, then it was mechanically expressed in assembly lines, after that we had hard wired chips and now it has moved into software. But, no matter how this has changed, it has always been about the best method to express our will and the backbone of that is passing information efficiently. It isn't about XML, Corba or whatever... as sure as XML is a certainty as a format to store data for the next 100 years, in 20 years we'll look back and laugh. I think a good phrase here is, "Every 1000 years, the followers of the current mainstream religion look back at the followers 1000 years ago and ridicule them". The difference for us, is that we see multiple changes like this within our own lifespan and yet, when we're stuck in the middle of the current new fad, we lose perspective and somehow forget about the last 10 technologies which were the promised golden bullet.

David from Oz on June 24, 2008 8:32 AM

How about the mental cost of learning the syntax of a bunch of new parsing languages? YAML, ini, bleh. I already know XML, why would I care to learn additional mechanisms for storing configuration/data persistence?

How about the anguish of working with immature and buggy APIs that parse these languages compared with the proven and stable apis that are built into Java/.NET? I don't need an external DLL. I don't need to unit test that piece of code. With XML, it just works right out of the box.

How about training costs? I lead a team of 5 engineers. I have not had to explain XML to a single one of them because they have either known it coming in (due to the pervasiveness of XML and .NET) or they were smart enough to look it up on the internet. Can you say that for the configuration flavors of the month your propose.

Jeff, I think your frustration comes from a lack of tackeling enterprise level apps. These rants are starting to sound like Joel not likely Exceptions or the need for a new language. You are so overly concerned with the little details that you miss the bigger productivity picture.

If you think about it from a Domain Driven Design (http://en.wikipedia.org/wiki/Domain_driven_design) perspective, XML is just a persistence layer. It's unimportant and you shouldn't be spending time on it. Focus on what matters - the domain.

Jim Greco on June 24, 2008 8:33 AM

The passionate hatred reminds me a little bit of some Firefox fans, for example. Don't get me wrong -- I use Firefox and I'm happy with it, but I don't get into any heated discussions about it. It's just a browser. But a quick visit to some random web forums, and you'll inevitable see people turn into raging lunatics when they talk about how much better it is than IE, and how dare anyone say anything bad about it (or abbreviate it the wrong way, for that matter).

I know the word has become trite, but fanboyism is probably the best way to describe it. Whether it's XML, or Firefox, or Ruby, or Linux, or Microsoft, or whatever. Use whatever you want--there's no reason to feel threatened when someone else prefers something different. It seems as if a great deal of people are either insecure about their tools and software; or perhaps they consider it so much a part of their own identity they feel that a criticism of their tool is a personal attack.

Whatever the reason, that kind of reaction to your original post certainly speaks volumes about a person's maturity level.

Neil (SM) on June 24, 2008 8:34 AM

I already know XML, why would I care to learn additional mechanisms for storing configuration/data persistence?

I might ask you a similar question: why learn anything beyond exactly what is required?

I'm not proposing that everyone stop and rewrite every application written in the last 5 years, merely that people understand and are aware of the alternatives.

Jeff Atwood on June 24, 2008 8:43 AM

No one's mentioned them in this comment thread, but they're inevitable so I feel I should get it out there this time: Lisp S-expressions offer all the standardization and consistency of XML with far less syntactic noise. S-expressions were also conceived as a "machine format" as opposed to a human format, but they are eminently more usable. Why they're not in wider use these days I have no idea.

Not much to say here besides that, but really - they're easier to parse and generate for both computers and people. They're lighter-weight and at least as extensible. Coincidence is not a good enough reason to maintain the use of XML over simpler, saner formats!

Isn't XML just another "bug-ridden, slow, ad-hoc implementation of half of Common Lisp" with better marketing?

I really don't mean this as a troll, I'm just so dissatisfied with XML that I react strongly when the topic comes up. Sorry!

JoeOsborn on June 24, 2008 8:50 AM

It's just a standard like anything else. It's also very expressive, easy to use, and has loads of tools support out there.

Why on fords green earth would you hand-edit Xml or Html just to write a document, a post, or a comment on a website. It's better as a structured storage format, that's coincidentally really easy to send across the wire (because it's already text). How your users input data usually doesn't have anything to do with the storage format.

Instead of the argument being 'Use Xml', perhaps the better argument should be 'Why Are You Not Using Xml'. Why are you trying to reinvent the wheel (and by reinventing it, wasting your time, and wasting the maintainers time and everyone else's time).

Xml is not a UI, so Xml haters, come up with a better argument.

Maybe us 'xml fanboys' react because the alternatives are only superficially better. Maybe we don't want another markup language for the sake of it. Maybe we don't want to change existing code to suit the flavor of the month.

I have no doubt that Xml will be surpassed in time, but until there's something better out there, I'll stick with Xml. It works, so I don't have to.

Fanboy? on June 24, 2008 8:51 AM

XML took off because it was the first simple, recursive generic data format which is both machine and human parseable/editable.

That's it - there's no magic to it and it's certainly not the best solution many times but it can be fitted to near any need, thus its omnipresence in VS.NET and other enterprisey systems.

XML allows you to build DSLs really quick and simply. The user does not want to edit Java or C# code to configure their application but having them edit XML is often acceptable since its syntax is much simpler. And that's why XML is rarely ever used in a language like Ruby - Ruby can be made clean enough that the user doesn't even know they are editing Ruby. Only the quotes below are a clue you are in a programming language.

Ruby:
hostname 'http://foo.com'
port 50

XML:
hostnamehttp://foo.com/hostname
port50/port

So to great extent its all about cleanliness and ease of data expression in the language you are using.

More here:

http://www.mikeperham.com/2008/02/09/dsls-and-xml/

I'm definitely not trying to shill for Ruby here. All of this applies to any other language with lightweight syntax requirements so the code can be made to look very close to English.

Mike on June 24, 2008 8:51 AM

There's nothing wrong with XML. If you don't find it legible, take an hour and write a program to display the data in whatever format you want. It's not going away, so deal with it! Another lame post.

Josh Stodola on June 24, 2008 8:52 AM

I might ask you a similar question: why learn anything beyond exactly what is required?

For low-hanging fruit such as data persistance and configuration you should absolutely not care about going beyond the minimum required because these frivilous things don't make your app better.

What are the benefits? A little less pain in deciphering the meaning of foobar/foo vs. foo=bar? At the cost of...

* Additional training
* Buggy/immature apis
* Unknown performance
* Mental cost of switching between XML and the flavor of the month text file format

Jim Greco on June 24, 2008 8:57 AM

You mentioned a lot of topics on this article, but didn't get to discuss any to any interesting extent. Paradoxically, it was a nice read.

Diogo on June 24, 2008 9:02 AM

@JoeOsborn ... here's a link you may like:

"XML is not S-Expressions"
http://www.prescod.net/xml/sexprs.html

Anyways, most of the time I'm just using Xml files as an easy way for end users to have complex configuration files without me having to come up with a novel way of representing them. It's so trivial to write a class X, populate it the way you like, and chuck it into an generic Xml Serializer (or load it with a generic deserializer). Job done .. next.... I never even have to write the parser or parsing code.

Fanboy? on June 24, 2008 9:04 AM

"YAML is based on a standard, too"

That brings up the question: What exactly is a standard? I always thought it was something that a large number of people have agreed upon.

With XML, it's the XML Core Working Group, part of the W3C, a multi-national consortium with a large member base, including quite a few major tech corporations. The copyright for the specification is held by the W3C.

With YAML, it's... the yaml-core mailing list. The copyright for the specification is held by three individuals.

Now, XML is a pain in the ass to deal with, but... would you really consider the YAML specification a standard?

Powerlord on June 24, 2008 9:26 AM

I think that in the case of XML, having a standard at all is much more important than having the best standard possible. The cost of getting the entire industry to convert to a less verbose standard for document markup would be far greater than just dealing with the angle bracket tax. It's a case of: just pick one and be done with it, so we can get on with our jobs!

Ben P on June 24, 2008 9:26 AM

Many people are complaining about the “mental cost” of (manually) reading XML.

Come on! If you really dislike reading XML, can’t you write (once) a simple XSL transform that will convert any XML document into some format that doesn’t burden your mind so much?

If you can’t write that transform, then the data probably couldn’t have been stored in your “light on the mind” format anyway, which means that XML was a good choice.

I never really understood the fuss about XML tags. For me, opening and closing tags are simply a concept. Sure, currently we often go on and store and transmit those tags verbatim, which *is* wasteful. But with some simple tools we could process and operate on XML and tags at a high level without those tags having to exist physically at a lower level.

Edward on June 24, 2008 9:40 AM

If anyone want's to see a classic example of angle bracket tax, try reading and writing XAML, it'll make your eyes bleed.

Ian on June 24, 2008 9:43 AM

I'm with ya Jeff. I'd go so far as to say I hate working with XML. As such, I keep my use to the bare minimum and usually will look at things like JSON, etc, before I'll settle on XML.

And I think that's the point you're making. XML all the time is just bad behavior.

Do you really need XML to define your ORM mappings, or would a compilable fluent interface make more sense?

Is there a real need in your application for XML configured Dependency Injection?

Does your webservice really need to return a complex and strongly typed XML file, or would a JSON file work just as well?

I could go on.

And don't even get me started on XSLT...

Lucas Goodwin on June 24, 2008 9:47 AM

I have no problem with using XML for relatively small buckets of data - config files, individual transactions, etc. Anyone who uses XML for large buckets of data is drinking the kool-aide or smoking crack.

The problem with moving or storing large amounts of data via XML is that there is no easy way to locate subsections. Anything you do requires that the XML document be parsed from the very beginning. This isn't a big deal when you are dealing with a small document. A few k of characters can be parsed pretty quickly whenever you need it. When the document gets a little bigger, you parse it once and use the DOM model. What happens, however, when you need to process a document that contains several hundred MB of data, or even several GB of data? There is no practical way to handle these volumes. And yet I see people try to do this all the the time.

While record oriented techniques are clumsy in certain ways, they are far more easily scaled to large data sets. It is trivial to navigate to any particular element in a fixed field file, no matter how large. A csv file is almost as easy to handle. The data sets can be broken down into subsets quite easily. These sets can be streamed or paged easily. Arbitrary parts of the data set can be accessed without accessing every other part.

XML is perfectly fine for a lot of things, but programmers should consider scalability in the context of their projects. XML doesn't scale nearly as well as traditional record oriented approached to data.

RevMike on June 24, 2008 9:51 AM

I have no problem paying the "angle bracket tax".

Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes.

What happens when you start having to handle non-standard data? E.g. things need to fit on multiple lines, or contain odd characters? Then you'll have to invent an escaping or encoding scheme.

Newsflash, XML already has this. Why reinvent the wheel? I'm sure as hell happy I don't encounter custom CSV or other arbitrary delimited file formats that much any more, since 99% of programmers don't think about the exceptional conditions, and their crappy invented file formats can't handle them.

nexusprime on June 24, 2008 10:29 AM

iIntegrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes./i

a href="http://www.w3.org/TR/REC-xml/#sec-guessing"From the spec/a:

iThe XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways.../i

So, your choice for data integrity is a spec where determining what encoding it's in is, in the words of the authors, "not entirely hopeless."

ben on June 24, 2008 11:00 AM

At my place (a research lab), people barely throw at me rocks because I use S-Expressions in place of XML for my various numbers crunchers. Basicly, my S-Expressions are used to setup factories, which in turn build-up objects that are then tinkered by my application. A real life example :
----
embryo(
material(
name = 'steel'
density = 7860.0
max.strain = 0.1
young.modulus = 210000000000.0 # newton/meter square
)

control.model(
control.model.A(
nb.chemicals = 1
damping = 0.1
nb.cell.neurons = 1
nb.edge.neurons = 4
)
)

template(
beam.template(
load = 6000.0 # newton
radius = 0.001 # meters
width = 2.0 # meters
height = 1.0 # meters
nb.hrz.patches = 8 nb.vrt.patches = 3
)
)
)
----

Imagine the same in XML : it would be less readable. Writing a parser of this ? Hey, mine fits in barely 300 lines of C++, does syntax error checking with gentle exception do give error messages. It's a LL(1) grammar, so a finite state automaton and a stack and you're done... XML grammar is lot more demanding.

S-Expressions
* readable
* lightweight

XML
* not very readable out of tiny files with less than 3 levels of imbrication
* heavyweight in ressource

Alex on June 24, 2008 11:14 AM

You say that the example you pick has a large difference but when you have 100s of that to send over the wire its gigantic!

I'm writing a mapping application and guess what? I have to load around 100s of point of interests to overlay on a map at once and god forbid, the data comes in XML.... imagine parsing all that using the browser's javascripts.

There's more to verboseness and parsing headaches, there's also the space and network bandwidth and CPU cycles tax and oh.... I can think of a lot more when everything is in XML.

totally agrees w/ you.

chakrit on June 24, 2008 11:20 AM

Just like Lucas said - why not try JSON instead? I also liked Douglas Crockford's assessment of it being pretty much XML without all the crap in it.

After many years spent dealing with XML and RSS in particular, I'm going for JSON in my future projects. It's either that, or I having to come up with an even better way.

Kari Ptil on June 24, 2008 11:44 AM

I've been doing some .NET and SharePoint development the last year, and it's the best way to start hating XML: Handling it is very clunky (e.g., having to define a namespace manager even if there's no namespace defined), it's used in the most ridiculous places (CAML is just verbose SQL), it's bastardized (ASP.NET should be put to sleep) and it's used even for name=value content like web.config. If that's your exposure to XML, no wonder you're thinking twice about using it.

Victor Engmark on June 24, 2008 12:46 PM

quote. Every time I look at my web.config XML file, there's a mental cost of me having to parse all these tags in the file. end quote.

Thats the bottom line. Get used to it. If you're having a problem reading xml, learn how to do it better.

Dave P on June 24, 2008 12:59 PM

If the intellectual overhead of using XML is so low as to be insignificant for ANY task, as some fanboys claim, then why are there still non-XML formats out there? Why do we not write Java or C++ or C# or Python code in XML format?

Simply put, Jeff is right, XML is not a panacea. Even if it is still your "go to" choice as a data format if it is your one and only choice in any and all circumstances then you must accept that you are limiting yourself. Think, use your judgement, weigh the pros and cons of using XML and of alternatives, then decide.

There's a marvelous book called "Conceptual Blockbusting" which talks about many aspects of creativity and problem solving and one of the most useful things I got out of that book was the term "satisficing". When problem solving there are two general categories for the methodologies used to arrive at a solution. On the one hand there are all of the methodologies which find a solution which is workable and then stop and move on to implementing that one solution. On the other hand there are all of the methodologies which continue to look for other, perhaps better, solutions even after one has been found which may be workable. The first strategy is called "satisficing", and many people do it without thinking. It's the reason why refactoring and redesigning and rebuilding things is often necessary. People working on a solution didn't stop to evaluate their design or consider alternate designs and instead just jumped ahead to implementation. This is the difference between buying a car by putting it on your credit card and buying a car by financing it using a low interest automotive loan. Assuming your credit card limit is high enough, both solutions might be "workable", but they are not of the same character and I think it's clear that one of these solutions is in almost all situations vastly preferable to the other. So the next time you've come up with a solution to a problem ask yourself whether you're cutting yourself short by satisficing, maybe you should spend some time (but not too much time) trying to come up with other solutions so that you can compare various solutions against each other and determine which one is the best, it may be your original solution, or it may not. A few minutes of forethought now can often save hours or eons of pain later.

The same applies to XML, this should be common sense.

Robin Goodfellow on June 25, 2008 2:18 AM

@leppie the problem with s-expressions is the same problem with json. because you eval it directly, you better hope that it just contains data. And to my eye, looking at s-expressions is no easier, nor harder, than xml.

Nobody ever said Xml was anything revolutionary. It's not a silver bullet. But just because it brought together a bunch of disparate technologies hardly makes a counterpoint. (The Bugatti Veyron takes all the best know-how to make one kick-ass car, therefore it sucks- no wait that can't be right)

@RevMike - it's entirely possible that you shouldn't be querying multi-gigabyte files in pure xml. Nobody ever said Xml was a replacement for a database, or other file structures. Index nodes in the file or use the xml as the basis for a cached copy if it's a measured performance bottleneck, or convert it to a format that works for it's intended use.

-------

There's still no compelling technology to replace Xml. And no matter what happens, design by committee and bad coders will no doubt create monstrosities in any technology.

Problem Exists Between Keyboard And Chair. So until computers start to write their own programs, you're stuck with monkeys like the guy sitting next to you.

philx on June 25, 2008 2:27 AM

Peter Palludan,

(defparameter *summary*
..'((pros
....."everybody can do it"
....."global standard"
....."it can be used for almost everything")
...(cons
....."too verbose"
....."hard to parse"
....."another language")))

;; SAX? DOM? Pah!

(defvar *good*)
(defvar *bad*)

(dolist (yay (rest (first *summary*)))
..(push yay *good*))

(dolist (nay (rest (second *summary*)))
..(push nay *bad*))

Mikael Jansson on June 25, 2008 2:31 AM

XML is great for taking certain types of complex data and storing it in a file that's not crazy.

But it boggles my mind, and infuriates me when someone tries to store really simple information that way. Every once and a while an open source project with a simple config file, not much longer than the vegetable=carrot example, and convert it to some six layer deep XML monstrosity. What's WRONG with those developers? Are they just not thinking about what they're doing? Do they not understand the purpose of XML in the first place? The purpose of config files?


AndyL on June 25, 2008 2:41 AM

Here are three types of data for which XML is a Really Bad Idea(TM):
* Non-hierarchical data, since you'll have to deal with idrefs everywhere. Just use a DB.
* key=value data, but don't come around complaining when your neat little format turns crazy since you hacked it to contain 2-dimensional arrays and ID references.
* Enormous amounts of data - Use a DB or a custom binary format to optimize the handling.

Victor Engmark on June 25, 2008 2:44 AM

I'd say the line-by-line method for your config files is fine. If you have a spec that includes strict restrictions on what can be in this file. Strangely the software using such 'simple' files seems to be exactly the software without such specifications.

Once you start having strings in there, you'll already have to guess the encoding and probably about how to escape line endings and = signs as well. Before you know it you're trapped in this and have to actually think about what to do when writing your config file. With XML you just use a readymade library and there your are.

For easy readablity you can still use a graphical XML viewer. But I'm pretty sure that some clever syntax highlighting will get you most of the way already.

ssp on June 25, 2008 2:45 AM

For me it's not XML itself I hate but rather the developers that thinks that by selecting XML one actually solves any major problem. One always have the problem of deciding good structuring, referencing and representation. If you have a simple thing to send/store XML why use XML? If you have a complex thing, do XML help? How? Outside webservices I seldom see it...

Quote from subversion 1.4 release notes

Working copy performance improvements (client)

The way in which the Subversion client manages your working copy has undergone radical changes. The .svn/entries file is no longer XML, and the client has become smarter about the way it manages and stores property metadata.

As a result, there are substantial performance improvements. The new working copy format allows the client to more quickly search a working copy, detect file modifications, manage property metadata, and deal with large files. The overall disk footprint is smaller as well, with fewer inodes being used. Additionally, a number of long standing bugs related to merging and copying have been fixed.

Says it all

Edward on June 25, 2008 2:48 AM

It is odd how people feel that the alternative to not using XML is to use an in-house format.
Why not have 3 or 4 standards, ranging from "simple but limited", to "comprehensive but complex". In fact, we already have these, in the form of everything from XML to a simple list of line separated entries. (Ever encountered the dreaded list of names, all surrounded by "name" tags, AND THAT'S ALL THAT'S IN THE FILE? XML was not needed there.)

Think of it like Newtonian and Einsteinien physics. Einsteinien physics provides a more accurate model, but Newtonian physics is used for most situations, because it's easier to do the maths, and on the scales it's used, there's no difference to the results.

I figure a lot of people who have declared geekhad on Jeff have encountered someone who developed an in-house format all their own, and are assuming that's what he suggested. Surely all he's saying is that you should consider other options before the knee-jerk XML route. There's probably a list of questions you could ask, a few examples being
1) How much data is being used?
2) How deep does that data go?
Anyone got any others?

Tom on June 25, 2008 2:54 AM

[Third time's a charm :-)]

Thinking about it, most of the repetition (and therefore visual crud) comes from the closing tags repeating the name in the opening tag. I'm just thinking out loud here, but how hard would it be to come up with a shorter "default" closing tag? For example, an empty closing tag ("/") could be used to mean "close the innermost open tag". The sample XML would look like this:

doc
fruitpear/ !-- closes "fruit" --
vegetablecarrot/ !-- closes "vegetable" --
toppingwax/ !-- closes "topping" --
/ !-- closes "doc" --

It's just syntatic sugar but it's considerably shorter than the original (if you remove my comments, of course).

Andrew Francis on June 25, 2008 2:59 AM

Two Points
1. XML is unreadable - how many times do you end up reformating the web.config file to line the attributes up just so you have a chance of reading it.

[add name="zzzzzzz" value="vvvvvvv" /]
[add name="z"_____ value="v" /]

2. .net needs to support other text based serilizers out of the box - before we stand a chance of anything changing, I bet that list of names mentioned by a previous poster was a serilized array. (0 effort on the part of the coder to persist some data to a file.)

Robert on June 25, 2008 3:10 AM

The project I'm working on uses XML for all its configuration and output data. Some of these files are pleasant to work with whilst others are truly dreadful. The main difference between them is the quality of the design and the level of understanding of XML the designers had.

The well designed files are easy to read and easy to edit (especially in an editor that can parse the schema and do auto-completion).

The poorly designed files fail on many levels. For example, the system we're developing consists of a set of software components. The components to instantiate are defined in an XML file. Each component has a set of parameters for each instance. These parameters are stored as child elements of the component definition as a key/value pair list. Components reference their parameters by name, the name has to be cross referenced to the key using another XML file. The upshot of which is that hand editing the file is impossible, even with auto-completion. An example (convert to XML):

component
__id
__name
__type
__parameter
____parameter_id1
____value1
__parameter
____parameter_id2
____value2
__parameter
____parameter_id3
____value3

parameter_definition
__parameter
____parameter_name1
____parameter_id1
__parameter
____parameter_name2
____parameter_id2
__parameter
____parameter_name3
____parameter_id3

So, when the component is instantiated it attempts to get parameter 'parameter_name3' which has to be found in the parameter_definition table to get parameter_id3 (no guarantee it's there though) then use parameter_id3 in the component's local parameter table. So even though you have XML that conforms to the schema, data can be invalid or even missing.

With XML and well designed schemas, the source control check-in process can be set up to validate XML files against their schemas:

on check in xml
test against checked in schema defined in XML file
pass schema check - check in file
fail schema check - display error, don't check in file

As Jeff points out, it's all about using the right tool for the job. You wouldn't use XML to transmit data across a CAN bus for example, or as configuration data on limited performance embedded systems.

Although XML is very useful, it's not a panacea for all software problems.

Skizz

Skizz on June 25, 2008 3:11 AM

When XML came out, and then exploded in popularity, I groaned on the inside, since even though everyone was parroting the lie that XML would never be read by humans, it was painfully obvious that it would.

I had retard try to rewrite my code to use XML, broke it horribly, and then spent a year trying to make it work, while I carried on blithely with my original code and finished a 5 year project in about 1.5 years.

Bill on June 25, 2008 3:39 AM

Once again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable

well, actually, the officially given reason for repeating the opening tag name in the closing tag (there's no real reason to) is for parsing by a human. :) heh


anyway - I just read a relevant paragraph in the O'reilly "RESTful Web Services" which I thought I'd share (just the conclusion) -

"JSON is useful when you need to describe a data structure that doesn't fit easily into the document paradigm"

Ed on June 25, 2008 3:40 AM

Windows only, but I recommend liquid XML studio freeware. Makes parsing those XSD's and making sure that you've created documents that match them much easier.

http://www.liquid-technologies.com/Product_XmlStudio.aspx

Haven't tried the pay for version, but the free one is OK. A tad buggy, in that sometimes the search gets lost and stuff, but for freeware really good.

Francis Fish on June 25, 2008 4:17 AM

Jeff, I disagree often and a lot with things you write in your blog; but hey, it's everyone's right to have his/her own opinion, isn't it? Still your opinion matters to me; why would I otherwise even read your blog? I read your blog, because sometimes you come up with really interesting ideas and aspects most people never even thought of. Many programmers take certain ideas as facts; "that's just the way it is". They don't even dare to question them. You question many of these and even if you may not be able to come up with a solution to all problems, your blog at least makes people aware of possible issues, kind of "See this, now see that... see the problem?" and this often causes an "Ohhh" or "Ahah" effect. People start to reconsider their facts and recognize that these are not set in stone.

Back to topic: XML is not a fact. XML is not God-given. XML is an idea. An idea that got popular. XML might be a good solution for some or even many problems, however it may be a poor solution for other problems and even if it works as a solution for some problems, there might still be better solutions than that. You seem to dislike XML and guess what, this is one of the topics I seem to agree with you.

The main problem I have with XML is: For whom is this language actually designed?

A) For human beings, so you have human readable data? Really? Well, as you pointed out before, XML is very hard to read. Easy samples like shown above are still human parsable, but I can give you a 2 MB XML file that will make you cry. XML is not for human beings, it's too verbose and too complicated once the data file grows beyond certain limits.

B) For computers, so you have a standard way to store arbitrarily data? Certainly not. XML is far from being easily parsable for machines. I can think of 100times easier to parse data formats if it only needs to be machine parsable.

So if XML is neither for A nor for B, what is it good for anyway? I guess it is the try to create a format that is at least somewhat human readable and at least somewhat easy to parse for machines. Bad choice!

Instead I had created two equivalent formats - so that you can always convert between them in a 100% lossless way. One that is very easy for machines to be readable and one that is very easy for human beings to be readable. Sounds like a much better approach to me.

Actually Apple had such an approach. Apple has the old NextStep PLIST format, which is very easy for human eyes to read. And they have a very compact binary format. Both were replaced by XML instead. The NextStep Format is legacy and the binary is not legacy, but it's not the default format being used either.

Here's an example for the new XML PLISTs:
http://tinyurl.com/54dv52

Compare this to the old, human readable ones (much more readable):
http://tinyurl.com/5yefeb

There is no description for binary PLISTs; but be assured, these are optimized for being machine readable.

Mecki on June 25, 2008 4:19 AM

I think you're missing a detail: XML is meant to be read and edited by humans only as a last resort. The normal situation is to edit and consume them using tools written for that purpose.

I mean, you don't really bash the bmp format because it's verbose and hard to read and edit, do you? That's because you understand that you're not supposed to edit it with a hex editor. Same goes for wav files. And so on.

Why does XML baffle you so much? Because it looks readable by a human, albeit with some difficulty, instead of looking like line noise? That doesn't mean it's meant to be edited in its raw form.

Moreover, programming is *not* an art. It's an engineering discipline, therefore it's not about creating the perfect program; it's about delivering something good enough for the customer, on time and with the least possible cost.
Sure, JSON, S-lists and custom binary protocols would be better solutions, performance-wise. But are the tools to produce and consume these formats as cheap and pervasive and tested as those for XML? Not as far as I know. So, yeah, XML is almost always the best choice whenever you have to serialize a tree structure. Its performance may suck compared to the alternatives, but it's cheaper and allows you to deliver sooner, and that makes your customers happier than a more expensive program delivered later that runs 20% faster.

Fogbank on June 25, 2008 4:57 AM

XML is powerful and that's why it is popular. You can represent almost any data model using it.

Should you use it in order to represent a simple properties list. I would rather just use the Java Properties file or a Windows INI file. But, for anything more complex than a simple key = value file, I would use it.

XML is text based which means it is easy to work with. It is white space neutral and EOL agnostic which many of its proposed replacements are not. I've used some of them, and when you're looking at trying to keep the seventh level of indent straight, you'll find that they can be even more of a pain to use.

That said: XML is very hard to read and write. But, there are hundreds of XML editors that can help. Use them.

And, does Microsoft over use it? Of course they do. Microsoft takes everything to the extreme.

Your company is based upon proprietary technology? Then reject ALL open source as pure evil. Make sure your stuff works with nothing open source without a lot of pain and suffering. Keep all protocols secret and keep updating them, so anything open source that tries to use them will fail. Spend hundreds of hours inventing your own way of doing everything even though there is already a readily available solution being used by everyone else. Take standards and extend them until they break!

Is XML good? Then make everything XML. Save files in XML format. Make all configurations, no matter how simple in XML. Make XMLs of XML files.

Were there too many DOS INI files in those pre-windows days and they tended to be all over the place? They create a massive binary repository for all settings. Heck, even create new settings that can go in there. Link everything together with GUID and make it so fragile that one random change can destroy everything.

Maybe they should be drinking a bit more decaf over in Redmond.

David W. on June 25, 2008 5:14 AM

XML is bad for many reasons already outlined, but YAML is not a good solution imho. JSON is better. YAML obscures the structure with white spaces. It's dangerous, if you let someone edit a YAML file who is not experienced with YAML, they can completely mess up the structure of the data.

Fabrice on June 25, 2008 5:22 AM

On the subject of XML and abuses thereof, how do you feel about XML comments in C#, Jeff? My feeling is that they are hideous and bloated, but that I can't get away from them because there's no other way to get the benefit of comments appearing in Intellisense tooltips and stuff like that. I only wish that they'd chosen a more concise format that's human-readable. Either that or a more advanced source editor that renders and edits the XML comments differently from the rest of the code.

Weeble on June 25, 2008 5:29 AM

I was under the impression that XML was arrived at so that endless discussions like these wouldn't happen!

Valentin Galea on June 25, 2008 5:34 AM

Make XMLs of XML files.

We will most certainly need XML files to keep track of our XMLs of XML files. I'm envisioning a beautiful forest of angle trees..

Jeff Atwood on June 25, 2008 5:36 AM

Intriguing comments, but surely as human readable as XML is supposed to be, no one in their right mind is going to try and parse catalogues of stuff in xml?

Use the right editor, something that reduces XML to it's DOM, expand it as you wish.

You can produced structured better tools for dealing with XML, and that's how you should use it.

Sure, parse it in your mind yourself, but bear in mind, human readability is an advantage of XML, not its purpose.

goatslayer on June 25, 2008 5:41 AM

XML is great for heirarchical data and as a non-binary storage format for information that you may wish to use in lots of different ways - XSLT is weird to use but very powerful.

It's a nightmare for storing name/value type information and also actually for things like the configuration files I'm working with at the moment that, for reasons best known to Beelzebub, stores regular expressions as xml tag attribute values. As you can imagine this creates an escaping nightmare as you have to do your regular string escaping, your regular expression escaping and then your XML escaping before it can be added to the document, giving you many layers of possible (and indeed probable) fail when you are editing it by hand. I'm working on a simple editor that gets around the whole problem ( I only just started here ) but I'm amazed that people have done this for years and never apparently questioned it.

Breakfast on June 25, 2008 5:45 AM

Jeff,
I am an administrator (doing Linux right now, but I have also been a Windows admin in th erecent past). I have been reading your blog for about 2.5 years now.
I am amazed at how many people seem to miss the point of your post -- to reiterate (again) THINK!.
Please, do not make me parse XML if a simple key=value list will work. People keep commenting "just use a parser" or "syntax highlighting will make it readable". Drop the arragonce people, the data is not just for you to read. Us poor admins need to parse many of these files also.

btw - Keep up the excellent work (I may not always agree, but I always find it interesting and thought provoking).

Douglas on June 25, 2008 5:52 AM

I use one of three methods of storing data. CSV, XML, SQL Server.

I consider this a fairly extensive toolkit for doing almost everything I need to do from the simplest to the most complex.

I do look at other alternatives, and seriously consider using them. However, I try to limit the tools that I use in order to be more efficient with them. The expression "Jack of all trades, master of none" springs to mind were I to use many more methods.

Therefore, until I come accross something that I truly consider worth adding to my toolkit or replacing an existing item then it will remain as it is.

As for XML itself, if the data I need to store is fairly simple then I may well hand crank it. I personally find xml extremely clear and easy to read if I've written it myself.

Alternatively though, I will quite happily produce fairly complex XML where a system didn't quite justify or require the extent or performance of full SQL database. These files will never be hand cranks but will be written and read entirely in code. Seeing as I mainly write in C# then it couldn't be simpler!

Whilst the XML format may cloud the actual data, or as you say, "only half of it actually matters". The main thing is, whilst human-readable might be wrong, coder-readable is certainly true. The main point with using XML as my data source means that should push come to shove and I really need to get at or change some small item of data, I can spend 15 mins with notepad and do just that! This is not something I do on a regular basis, its just nice to know that it's something I can do if I really need to.

Robin Day on June 25, 2008 6:34 AM

IMHO good example of XML abuse is requiring XML (and XML parser) if you want to push to server via HTTP(S), i.e. for WebDAV. WTF XML does here? HTTP uses nice configuration mini-language (headers), why not use it?

Jakub Narbski on June 25, 2008 6:34 AM

Many of the comments here seem to be like "There's nothing wrong with XML. Learn to read it"

They are missing the point, XML is designed to be read by computers not by humans, if you are forced to read it or write it then your program is broken, if you are often having to write it your interface is broken

If you want to use it to modify settings/parameters etc ... then use a human readable format... or use an interface, you should never force people to read/edit raw XML

Jaster on June 25, 2008 6:35 AM

Yeah, XML has a standard so we should just stick with it.

"That's how we've always done it."

MattH on June 25, 2008 6:44 AM

I am currently developing an application that has a project file. The requirement of the project file are:

* User may want to edit project files directly, i.e. it should be editable by a general purpose editor.
* Edited project files should be consistent, no unmeaningful content is welcome.
* Project files content should be easily mapped into objects.
* In case of a version (content) change of the project file, previous files should be migrated.

And the choice for the project files' format is XML. For first three tasks I only write an XSD schema and use JAXB to generate classes that map to xml files. For the migration task I use either XSLT or DOM.

I don't think any other technology (such as YAML) or in-house built code would be as mature as XML for my requirements.

Bahri Gencsoy on June 25, 2008 6:51 AM

Thinking is key.... You should always be thinking about using the right tool for the solution. XML is pretty cool stuff, but it *does* bear a tax. For you PHP folks out there, think of what the php.ini file would look like and bloat to if it were XML. Think of how much additional annoyance and frustration you would have while walking through the file to adjust your settings. Even the simplest of tasks, commenting a line, for example, suddenly becomes an ordeal of angles and dashes.

Dan on June 25, 2008 6:51 AM

@Weeble:

Yep, the XML comments in C# are a pain to use. I think they were a very poor choice. They feel incredibly bloated, particularly if you want to add anything useful like referencing other methods/classes etc.

For those that are unfamiliar, consider this simple comment block:

/// summary
/// Closes a Thing instance, previously opened by see cref="OpenThing"
/// /summary
/// remarks
/// Will fail if paramref name="toClose" is already closed.
/// /remarks
/// param name="toClose"The instance of Thing to be closed./param
/// returnsTrue if paramref name="toClose" was successfully closed, False otherwise./returns
public bool CloseThing(Thing toClose)...

Compared to a slightly saner (but still parsable) human-readable syntax:

/// Summary:
/// nbsp;nbsp;Closes a Thing instance, previously opened by [OpenThing]
/// Remarks:
/// nbsp;nbsp;Will fail if {toClose} is already closed.
/// Params:
/// nbsp;nbsp;toClose - The instance of Thing to be closed.
/// Returns:
/// nbsp;nbsp;True if {toClose} was successfully closed, False otherwise.
public bool CloseThing(Thing toClose)...

In the XML example, there were 161 characters for markup and 194 characters of "real" content - so around 45% of the comment was 'noise'.

Graham Stewart on June 25, 2008 6:57 AM

I get the feeling this article, however many times it's repeated, is doomed to wage a holy war within 1 comment. However, some comments of my own:

"Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes."

XML does absolutely nothing to ensure integrity of data. It ensures integrity of *syntax*. In some cases, the greater simplicity of - say - "name=value" is worth it, especially given that the syntax check is trivial.

"... if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?"

CSV is a 'standard', in that it's a well-recognised format with defined rules, shared by countless systems around the world. HTML and CSS are 'standards', yet almost every web browser out there implements them slightly differently. Saying something is a 'standard' is really saying very little; the important thing is that the data format is open, rather than proprietary.

"Once again, the issue is that XML isn't meant to be parsed by a human."

Really?? In that case, why not optimise the hell out of all of your DTDs - no more "Student" element when you can make do with "S". Providing a GUI layer in between the human and the data is all well and good, except when that layer goes wrong, or you don't have access to it. I guess, by your argument, XHTML isn't meant to be parsed by a human, and we should all be authoring our web sites in Frontpage. The benefits of readable textual data are *massive* - try reading 'The Art of Unix Programming' if you're not sure why.

bobby on June 25, 2008 7:02 AM

Should I ever get to work on any of your code, Jeff, you better remember this: http://www.codinghorror.com/blog/archives/001137.html

XML is wicked easy to pickup, and soon as you have: you're reading, writing, editing all sorts of files written in xml. It doesn't mean that you SHOULD but you CAN. So drop the nonsense about learning a million new formats and standards just because "you should learn the minimally required": well, if I know xml, isn't THAT the minimally required? Any OTHER language/format/standard just requires more.

Apart from that, it's a truism that you should use the right tool for the given job. XML is bad for some things and great for others. Leave it at that.

Regards
Fake

Fake51 on June 25, 2008 7:06 AM

I'm just going to go back to reading INI files

Private Declare Function GetPrivateProfileString Lib "KERNEL32" Alias "GetPrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpDefault As String, ByVal lpReturnedString As String, ByVal nSize As Long, ByVal lpFileName As String) As Long
Private Declare Function WritePrivateProfileString Lib "KERNEL32" Alias "WritePrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpString As Any, ByVal lpFileName As String) As Long
Public Sub WriteINI(wiSection As String, wiKey As String, wiValue As String, wiFile As String)
WritePrivateProfileString wiSection, wiKey, wiValue, App.Path "\" wiFile
End Sub
Public Function ReadINI(riSection As String, riKey As String, riFile As String, riDefault As String)
Dim sRiBuffer As String
Dim sRiValue As String
Dim sRiLong As String
Dim INIFile As String
INIFile = App.Path "\" riFile
If Dir(INIFile) "" Then
sRiBuffer = String(255, vbNull)
sRiLong = GetPrivateProfileString(riSection, riKey, Chr(1), sRiBuffer, 255, INIFile)
If Left$(sRiBuffer, 1) Chr(1) Then
sRiValue = Left$(sRiBuffer, sRiLong)
If sRiValue "" Then
ReadINI = sRiValue
Else
ReadINI = riDefault
End If
Else
ReadINI = riDefault
End If
Else
ReadINI = riDefault
End If
End Function

How do I call an API in .NET?
:)

Jay on June 25, 2008 7:24 AM

@Fake51: "XML is wicked easy to pickup"

Actually I think that is a pretty common misconception.

Most XML-abuse situations I've seen have come from developers who found XML "easy to pickup" and converted their existing file format by randomly adding a few angle brackets to produce structureless blob that impossible to validate.

In reality there are quite a few aspects to XML to master before you have really picked it up (XML Schema (XSD) or DTD, XSLT, validation, XPath, XQuery, XPointer, UTF-encoding) not to mention the more philosophical issues involved in designing a good schema.

I'd say "Bad XML practises are wicked easy to pick up. Good XML takes time and practise."

Graham Stewart on June 25, 2008 7:26 AM

"not XML the religion"

That's the point, exactly!
Seems some people need religion but fail to empty
their religious cache in church ("What church?").
So religion pops up at places where it doesn't belong.
XML, LDAP, Agile Development, ...

I'm really p*ssed when things like "Agile Development" suddenly
get this religious monumentum and people start to use
new toys because the believe in them instead of relying
on scientific data.

Nice reads:
Terry Pratchett's "Small Gods" was an eye-opener for me
when it comes to church and religion.
[ The "Science of Discworld" books are nice too ].

I also recommend Alfie Kohn's "Punished by Rewards".
It will crash everything you believe in about the school
system, performance payments, stick and carrot by providing
scientific data.

I have a strong mathematical background so I will never understand
why people replace a properly working (mathematical beautiful) relational database system with "crap" like XML or LDAP.
The objects in a database are tables. When you operate on a database
you get back a table, so you can use this output as input again.
Try that with XML or LDAP.
This isn't religion, this is fundamental mathematics (-Algebra)!


Erik on June 25, 2008 7:27 AM

In praise of sloth - my never-ending rant but no one reads this far down the comments so it doesn't matter:

Other than the fascination factor, the reason I began coding was that it saved me time in math classes. I could solve the same equation over and over or I could write a program once and never have to solve the equation again. Seemed like a no-brainer to my lazier proclivities.

A primary reason I continue to be interested in most technologies has largely to do with convenience. To that end, I typically opt for the one with the most support and best features. I used IE until FireFox had a really huge base, even though FireFox had almost always been better. All mp3 players in my house is are iPods because accessories and support are abundant. And I use XML over all other alternatives because there are libraries everywhere for it. My programming languages of choice have never included the new sleek ones that are so hot at the time and whose whole support base consists of a few thousand fickle fanboys, little to no real documentation, and no professional experience.

Thank you for raising awareness and getting us to think about WHY we use what we do. In my case, the above is why I still chose XML. If I woke up tomorrow and I was tripping over YAML blogs, how-to articles, support, plug-ins, libraries, and billions of man-hours of experience -- I'd switch.

Dinah on June 25, 2008 7:29 AM

It's an ongoing battle and a good topic for conversation. My only qualm is with the reader/commenters who are writing this off as a trivial matter or 'It's here to stay. Deal with it.' You are the reason we're having the problems. XML isn't here to stay any more than Fotran was here to stay (read: stay=in the mainstream).

All technology is subject to change and as programmers, designers, and salesmen of the technology we love...you have to ask yourself every once in a while if it is the right thing. To reject reflection on the toolsets and the future is the killer of innovation and discovery, not to mention bordering on the "stay the course" attitude that we've all come to love in our political representatives.

There are serious problems with XML, but it's among most impressively empowering tools available to the modern programmer. Readable? Efficient? Effective? Enabling? Perhaps! Realize two things: 1) XML is approaching ubiquity in mainstream applications 2)We disagree on some properties about XML. These two statements result in a non-trivial argument.

Raymond on June 25, 2008 7:35 AM

I've only recently started reading your missives, but on the subject of XML, I completely agree with you.

I fail to see how (say) a simple text configuration file is improved by wrapping up all the baggage of XML tags, who's only purpose seems to be to make life harder when it comes to parsing the file.

Yes, there are "standard" XML parses for pretty much all mainstream languages these days, but that doesn't make it right. Why use a big parsing library for something that could be done in a few lines of [insert your language here] if only the text file was simpler? It doesn't make sense.

Rich on June 25, 2008 7:35 AM

Yeah, lets stick with punch cards they are standard and everyone can read them.

As the main point of the article is THINK don't do stupid stuff. It is rather funny that most of the comments are don't make me think let me have my crummy old XML.

stonemetal on June 25, 2008 7:36 AM

xml, eh? don't get me started.

if only the old win 3.1 developers would have just kept their .ini files in the right places... :) now we have windows registries and xml files everywhere...

thats the thing with stuff like xml, once it gets used to solve a problem which stemmed from bad design/implementation to begin with... it is automatically being used for the wrong reasons.

my opinion anyway...

personally i'm all for raw binary data in a specified format... its most efficient to store and read, and unless the developer makes a mistake you won't have any problems. in practice though using an xml library is faster, even if it produces an inferior quality result for the end user...

Jheriko on June 25, 2008 7:40 AM

I once worked on a project which was a questionnaire program. The answers were stored in a comma-separated text file, which was then imported into MS Word and used in a mail merge to create a 200-page document. Over time, new questions were added. The answers were stored anywhere that had room in the CSV file to prevent breaking the mail merge macro. This caused a lot of problems in the long run because the questions and answers were out of order. It was very difficult to diagnose errors when they occurred.

One day I suggested we change the data format to XML. This would allow us to reorganize the questions and answers without breaking anything because the names of the nodes would stay the same forever. It would also make the data file more human-readable to diagnose entry problems. Though I don't know for certain because I haven't tried, I would suspect that parsing fruit=orange would be more difficult than fruitorange/fruit. Particularly if the word fruit was given as a value somewhere else in the file. (eg. food=fruit)

Scott on June 25, 2008 7:42 AM

Hmmm,

this is also XML and might be easier on the eyes :

doc
fruit is='pear' /
vegetable is='carrot' /
topping is='wax' /
/doc

This tends to drive XML believers nuts for some reason ;)

T.

Konijn on June 25, 2008 7:42 AM

I think we all have to be honest with ourselves, and realize that most of our time is spent fixing things when they *don't* work.

The data format (XML) doesn't matter when things work. It's when your trasmission fails, and you have to go trudging through the raw data to find the error that it really matters.

I find them all difficult to read in different circumstances. JSON and YAML when represented without line spaces (say, in your ajax debugger) are nearly impossible to read, and XML does ok. Custom formats actually do better! (Say with, something crazy, meant for joining text together—like pipes?)

However, a raw dump of a lot of data in YAML is easy to read, not so much in XML.

JSON excels when you have to parse it. It's already in your array! Just use it. Beautiful. It works like that on both ends.

Anyway, I think what we have here, is XML is the right concept, wrong implemention. Theoretically speaking, XML is awesome. It's standardized, easy to learn. Practically speaking, it's a beast. Those angle brackets are terrible. Terrible things! Use something else!

I don't think Yaml is the solution, but I think it's a step in the right direction.

I think to come up with a solution, we need to interface with someone who specialized in how to make text easy to read, and that would be somebody who is NOT a programmer. Come on great universities of the world, THINK!

Jeff Davis on June 25, 2008 7:46 AM

This just in, Jeff STILL hates XML, apple pie, America, and your mom.

Oh wait...that's not what he said. Disregard. Or maybe its the smackdown learning model ;)

It all comes down to convention for me. Once the first person writes some configurable project data in xml, that's it, you're locked in. Or you can maintain 15 types of text files on a single project. I think it's the same for most team choices, the project was started in java, so you can either continue writing in java to keep the maintenance down, or you can buck the trend and write in VB. And you're all-star enough to make that happen, because its what you carefully analyzed, considered, and decided was best for your situation. Then poor Joe that has to add one friggin field to a report is cursing you to high heavens because he has to load java in his brain for the server side and vb for the client. Similarly, I don't want to look at 5 kinds of markup for different aspects of the app. Don't put it all in one monster file by any means, but please don't roll out a different superior solution because of a braindead choice we made early on. Especially for text, it's not worth it.

SteveJ on June 25, 2008 7:47 AM

Jeff, you really should change the (no HTML) remark by the comment window to read (no HTML, but please remember to encode as amp;lt;, as amp;gt; and amp; as amp;amp; or the blog will eat most of your post)

Graham Stewart on June 25, 2008 7:51 AM

@Konijn: yep, see what I said above about bad XML being easy.
In my experience evil XML like that is all too common.

Graham Stewart on June 25, 2008 7:56 AM

@Graham Stewart

What? You mean you don't read the entire page as raw source code?

I thought XML was easy to read!

Just having fun. ;)

Jeff Davis on June 25, 2008 7:58 AM

There is a very real mental cost to parsing even a few short lines of XML.

I would suggest that the mental cost of parsing proprietary data files with no markup at all is much worse.

How many obscure configuration files did you scratch your head at in the pre-XML world? It usually went something like this for me:

"Hmm, how is this data laid out? What the heck does a colon mean as opposed to two periods?? Ok, I think I've got it. Now, applying this potential layout I think I understand, where might the data I'm actually looking for be? No, I think I misunderstood the two periods afterall. Wait, there's the data!"

XML gives readers a hint about the format, because at least you know the UNIVERSE of organization you're working in.

I'm not arguing that XML solves all problems, but I don't buy into the "XML is hard to read" camp. If anything, it makes it easier simply because more folks are familiar with it and it's documented.

Vance Vagell on June 25, 2008 7:59 AM

I'd personally like to see you provide a comparison of a web.config in XML and YAML side by side.

I agree that there is too much pain in dealing with that file, but I completely disagree that this is a result of it being in XML. The fact that half of the file is boilerplate that I never need to touch does it all on its own, and almost all of it is poorly structured does it all on its own.

Jess Sightler on June 25, 2008 8:30 AM

@Andrew:
They AREN'T using ASP.NET WebForms, they ARE using ASP.NET MVC. So it certainly does look like alternatives were considered.

@Mike:
You're actually going to use the summer to learn more about what XML can do for you? Might I suggest the other 59 days be spent up learning o/r mapping, or python or lua

Karl on June 25, 2008 8:30 AM

Funny, if XML's style and visual parsing 'expense' is a matter of taste, then I wonder why programmers seem to be quite happy writing code like:

int j = 1;

rather than
variable name="j"1/variable

or
variable name="j" value="1"/variable

or
variablejvalue1/value/variable

Tools and technologies should be used when they provide a net benefit, and not solely because they are a 'standard' or 'fashionable' (unless of course, standards and fashion are the sole criteria on which you measure gain.)

XML is useful. So are other formats. Use what makes sense for the particular need...

Michael Curry on June 25, 2008 8:36 AM

Thats the bottom line. Get used to it. If you're having a problem
reading xml, learn how to do it better.

I agree totally. People were created to make things easier for computers, not the other way around.

Heck, lets bag XML and move back to using binaries! Those can be read by a human just fine with a hex editor, a calculator, and enough time. It may be a PITA at first, but you'll get better at it. Quit being a bunch of lazy whiners.

T.E.D. on June 25, 2008 8:36 AM

"I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it."

Umm... go back and re-read like almost all of your blog posts that relate to .NET, Microsoft, or Windows Vista and see if you still feel the same way.

Sorry, but you are incredibly emotionally involved with the tools you use.

Cecil on June 25, 2008 8:36 AM

"As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project."

If you hate bracket syntax enough to post twice about it, and you want to encourage use of other tools, then maybe it's time to try something else?

Mattkins on June 25, 2008 8:39 AM

That brings up the question: What exactly is a standard? I always
...
With YAML, it's... the yaml-core mailing list. The copyright for
the specification is held by three individuals.

That's roughly how all the RFC's are done, and the internet is pretty much built on those standards.

T.E.D. on June 25, 2008 8:41 AM

?xml version=1.0 encoding=UTF-8?
post
nameNick/name
website/website
captchaorange/captcha
message
![CDATA[
I agree.
]]
/message
/post

Nick on June 25, 2008 8:47 AM

How many obscure configuration files did you scratch your head at in the pre-XML world?

If the config files in /etc were all in XML, I'd go on a killing spree. Thank god most of 'em aren't.

Talisker on June 25, 2008 8:53 AM

XML is just another example of creeping verbosity being palmed off as "better."

In some cases, XML is demonstrably better. If you have to represent a tree structure with text, I really do prefer XML.

.Net, javascript, et. al, all have the same problem. Consider the number of lines and characters necessary to write the code to read in a file. I've lost track of the number of languages in which I've had to rewrite that darned "ReadFileIntoString(strFilePathName)" function. Why do this? So I don't have to remember the human factors nightmares that most programming languages impose for consistency.

*More*, whether it's XML or dot notation isn't always better. Simple things should be simple. Complex things should be possible.

ThatGuyInTheBack on June 25, 2008 8:59 AM

Whoops! Jeff already talked about XML comments a few years ago:

http://www.codinghorror.com/blog/archives/000130.html

Personally, I don't think he was nearly harsh enough. I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain. And then there's the fact that you can't write things like "0 = x limit", but instead have to write "0 amp;lt;= x amp;lt; limit", or the strange-looking "limit x = 0".

Weeble on June 25, 2008 9:02 AM

And what would be *appropriate* "romantic overtures toward their significant other?"

Tim C on June 25, 2008 9:03 AM

The free Liquid XML Studio is great for working with XML. It will even generate some C# sample code for you. On the other hand, I recently wrote some code to parse Google's gdata XML in PHP and that was ridiculously painful.

Robert S. Robbins on June 25, 2008 9:08 AM

More comments»

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.