Revisiting the XML Angle Bracket Tax

June 23, 2008

Occasionally I'll write about things that I find sort of mildly, vaguely thought provoking, and somehow that writing turns out to be ragingly controversial once posted here. Case in point, XML: The Angle Bracket Tax. I'm still encountering people online who almost literally hate my guts because I wrote that post. You'd think I kicked their dog, or made inappropriate romantic overtures toward their significant other.

Well, first of all, we are talking about XML the markup language, not XML the religion, right?

I hope so. I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it. This doesn't mean I can't be enthusiastic or critical of those tools and technologies, but I'm not married to the stuff either way. Who needs all the emotional baggage?

Obviously I failed to communicate this before. I talked about this a little bit on Stack Overflow podcast #5 with Joel, where I tried to amplify and explain my position a little better.

I wasn't trying to present it as "Oh, XML is bad, let's all switch to this new markup language that all the cool guys are using". What I was trying to say is why don't we think about what we're doing? That's the general theme of a lot of the stuff in my blog. Can we just stop programming for a minute to think about what we're doing and not make a blind choice based on "Well this is what my tool does, so that's what I have to do"?

I think obviously there's pros and cons to each. I'm not saying that one is the right solution all the time. But I think, ironically, that is what is happening with XML. I think people are saying "It's always the right answer, because it can store anything, right? And all the stuff I use uses it, so it must be the right choice for everything." That bothers me a little. Maybe I'm just contrarian. Maybe I'm an iconoclast and I want to try different things and see different things, but I think actually understanding the alternatives helps you understand XML better, a little bit, too.

And I hope people reading my blog would not get the idea that it's about a knee-jerk reaction one way or the other. It's about understanding the tradeoffs and applying those tradeoffs to your particular situation. I think that is the absolute art of programming. It's understanding what you could do, and which one of those things fits your situation best. Versus what so many programmers do, which is "I've learned to use a hammer, and I'm gonna hammer everything." Ultimately, to me, it's about self-awareness.

By the way, I'd like to thank everyone who pitches in to make those Stack Overflow podcast transcriptions possible. It is because of your generously donated time that I am able to quote that audio here.

I don't post stuff to push people's buttons, I post it because I want programmers to think about their tools, their technologies, their methods.

Think IBM placards, taken at the Computer History Museum

If what I post here seems unnecessarily confrontational sometimes, a far smarter person than myself said it better than I can:

I blog to help others and also to learn. As it turns out both are aided by getting folks to actually read the stuff. Please pardon the necessary devices.

Please do pardon the necessary devices; I find that I often learn best through the smackdown learning model. That works for me. Maybe it doesn't work for you, and that's OK. There are millions of websites to choose from.

That said, I do actually have a problem with XML, or I wouldn't have written anything in the first place. I think there's a real issue here that is, for the most part, being completely ignored. XML fever may not be as debilitating as, say, Dengue fever, but it has side effects as well.

Consider Norman Walsh's Defending the Tax. Norman is an XML Standards Architect at Sun.

On the other hand, the difference between:

fruit=pear
vegetable=carrot
topping=wax

and

<doc>
<fruit>pear</fruit>
<vegetable>carrot</vegetable>
<topping>wax</topping>
</doc>

isn't really that large, is it? (Or maybe you think it is, de gustibus non est disputandum.)

The de gustibus dismissal means Norman considers it is a matter of taste, but it isn't. The difference is large. There is a very real mental cost to parsing even a few short lines of XML.

As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project. Every time I look at my web.config XML file, there's a mental cost of me having to parse all these tags in the file. Here's this tag, which lines up with this tag. Here's this giant, verbose thing where only half of it actually matters.

Sure, it's a small effort. Insignificant, even. But what's the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?

I also posit that these minor headaches may be more significant than you realize. In Stumbling on Happiness, author Dan Gilbert makes a similar assertion.

Stumbling on Happiness

His research found that people are bad at predicting their own future happiness. They tend to radically overestimate the positive or negative impact of large events in their lives – losing your job, getting rich, getting divorced, having children. That's generally good; it means we have defense mechanisms in place to adapt and survive in our changing circumstances as human beings. But, we also tend to radically underestimate the impact of the dozens of small events in our lives throughout the day. Thus, small injustices don't trigger our defenses. The effect of that squeaky screen door, the neighbor's barking dog, the interrupting telephone call – all of these may have far more profound cumulative impact on your day to day happiness than you realize.

It's a fascinating book, and I'm only paraphrasing the smallest part of it. I highly recommend reading it if this is at all interesting to you. It won't exactly unlock the secrets to happiness, I'm afraid, but you may gain a deeper understanding of why we tend to make the choices we do in our neverending pursuit of happiness.

I'm not trying to change the world overnight, but I wouldn't mind planting a few seeds of dissent in people's minds. This small stuff matters.

The next time you're trying to figure out an XML file, just think about it.

That's all I'm saying.

Posted by Jeff Atwood
189 Comments

Yes, I totally agree with you and I am facing this problem since the EPA decided that all data submissions would be in XML form. Even point-to-point inside business ones.

The operative words here are:
NO ADDED VALUE

John A. Davis on June 25, 2008 9:13 AM

Couldn't agree more, what is so hard about picking the right tool for the job in IT? Sometimes XML is the best choice, sometimes it isn't. You wouldn't (probably) use a screw driver to hammer in a nail, why do that when you code? 'Religion' in IT is ridiculous...

David Hayes on June 25, 2008 9:14 AM

Do programmers really like any markup languages? I don't. The tags annoy me. The syntax looks ugly and crowded to me.

Ivan on June 25, 2008 9:17 AM

I really don't get why you are focusing on the mental cost of processing XML. It's not meant to be human-readable. You are assuming that this is the PRIMARY purpose for the structure of XML. It is not. Interoperability is the primary reason for the structure, human readability is further down the line.

Nicholas Paldino [.NET/C# MVP] on June 25, 2008 9:22 AM

"I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain."

If you need comments in a file format that is mainly designed to be read and written by machines, you're doing something wrong. You're not one of those stupid people who decided it would be a good idea to replace good old .ini-files with some XML-counterparts?

BTW, has anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a *real* stupid idea.

Vinzent Hoefler on June 25, 2008 9:27 AM

maybe if you kick their dog they will forget about the whole xml post?

Darren Kopp on June 25, 2008 9:34 AM

Verbose or not, I'll take XML over comma delimitted files and fixed length structure anyday.

There may be instances where you can get by with either, but I hate having to account for encoding of special characters in my strings to serialize.

I don't find XML that un-readable... certainly more readable than fixed length or comma delimited if you ask me. But then again, why do I even want to read XML? There should be some app consuming it and all I should care about is that that app can read it and let the app present it to me in a readable format.

Kris on June 25, 2008 9:38 AM

These last two posts have probably been the best I've ever read on your blog yet. Keep it up!

John on June 25, 2008 9:40 AM

@Aaron G

Some conterpoints.

- Well formedness. All this means is that the XML document conforms to XML's syntax rules. If it doesn't the XML parser will fail, just like a incorrectly formed JSON or XML document

- Schema and Metadata, you don't get this unless you use and DTD which is built in to XML or XSD which is a whole different standard. Right now I am working on a project that is using XSD and I have to say, I hate it. THAT is hard to read, and since we are still working on the document it just gets in the way.

- There are tools out that allow you to create schemas for YAML and JSON. Kwalify being on of them. (http://www.kuwata-lab.com/kwalify/)

Personally I think that defining schemas and metadata would be better handled in some sort of Domain Specific Language that is not tied to any one technology. Human and machine readable, but focused soley on defining the entities of a system and their relationships to one another It could be used to generate UML, XSD, SQL DDL etc. In the design stage of a project that would help greatly.

Finally, I have seen lots of places where GB of XML data are used for data exchange. Yes, JSON and YAML would be as large, but not as large and for large datasets that can make a big difference. And I would drop Unicode if it is not required. As to 4 digit years, I wouldn't bother, unless you have a data file of nothing but years they don't add nearly as musch overhead as XML.

Andrew on June 25, 2008 9:40 AM

In the end I think that XML has it's place. It is great for creating structured documents. XHTML is a greate application of XML. I am sure there are other that run along similar lines.

But XML falls down when used for raw data exchange and config files. There are many other formats that can handle that better, and most of them have as many tools to work with as XML.

It feels like the people that want to use XML for everything are similar to the people that want to use one programming language for everything.

Andrew on June 25, 2008 9:46 AM

Don't apologize.

Doug on June 25, 2008 9:58 AM

Standards ARE tax. It's the price you pay for interoperability. That doesn't mean you have to use it. You can decide if you want to pay the tax or not.

With that said, I wasn't very fond of XML in the beginning, and I'm still not. If people want to use it, fine, but it's really nothing special. The most "special" thing about it is that magically everyone has agreed on something. It's more about timing than anything.

Angelo on June 25, 2008 9:59 AM

@Vinzent Hoefler: "If you need comments in a file format that is mainly designed to be read and written by machines, you're doing something wrong."

Weeble was talking about using XML comments in C# source code, which is the "correct" thing to do, but is a pain to use.
Take a look at http://msdn.microsoft.com/en-us/library/b2s063f7.aspx

"anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a *real* stupid idea"

Actually I quite like Ant.
Makefiles are full of magical archaic syntax (e.g. the automatic variables like $%, $^ or $(?D)) which can be a real pain to mentally parse if editing a makefile is something that you very rarely do.

Graham Stewart on June 25, 2008 10:01 AM

well, here's a smackdown for you: how is this a revisiting? This is a reiteration of your original point with very little evidence. You might as well have titled this "shitstorm part ii". What's next? "Another 50 reasons PHP sucks" article? Seriously, the discussions I have about coding horror these days with other programmers are now along the lines of "Hey, coding horror seriously started going bad once Jeff went pro blogger. Jumped the shark, completely".

I think we all realize that your employment is now tied to your blog and you now need to regularly make posts that get traffic, but you're alienating your readers with this constant linkbait. Don't tell us what "sucks" -- I think we all have opinions about what sucks or what is cool. Just tell us what is cool. You like YAML? Great, write an article about YAML. You don't have to simultaneously make the point "Oh, XML sucks". If you truly believe in using the right tool, XML *DOESN'T* suck, it's just sometimes the wrong tool -- and YAML isn't good because XML sucks, it's good because sometimes it's the right tool. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations. Without the linkbait language.

anonymous coward on June 25, 2008 10:01 AM

I'll agree that XML can be a bit "wordy" at times, but the given example is simplistic. For a simple standalone list, XML probably wouldn't be my first choice either.

But consider a more complex example, perhaps the inventory of a car dealership.

inventory
car
manufacturerChevrolet/manufacturer
model
nameCavalier/name
year2006/year
/model
colorBlue/color
powertrain
engine
cylinders4/cylinders
horsepower400/horsepower
/engine
transmissionautomatic/transmission
four_wheel_driveno/four_wheel_drive
/powertrain
/car
/inventory

Now that can absolutely be made less verbose by using XML attributes, but XML is very helpful for representing the *structure* in a human-readable form.

I wouldn't recommend storing an actual car inventory in XML (that's what databases are for), but it's very definitely useful for structured data, such as a configuration file, that a human might need to read. (As opposed to some of the old binary files stored via the MFC serialization mechanism.)

Is it the solution for everything? Absolutely not! Using XML (or any other new technology) everywhere is phase one of adopting a new technology. XML isn't intrinsically bad, but just like any other technology, it can be misused.


The three phases of technology adoption are:

1. Refactor the entire system to heavily overuse the new technology, especially in manners where it was never intended to be used and/or is completely ill-suited.

2. Refactor the system again in attempt to remedy the problems caused by the previous refactoring.

3. Refactor the system with the next hot technology.


http://thatblairguy.wordpress.com/2008/03/10/technology-phases/

Blair on June 25, 2008 10:01 AM

@anonymous coward
"You like YAML? Great, write an article about YAML. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations."
That was a good point atrociously expressed. I would like to learn more about YAML. How about it, Jeff?

Tom on June 25, 2008 10:06 AM

I hate XML. It just ain't pretty.

Preeti Edul on June 25, 2008 10:08 AM

This just seems silly. If it bothers you, do something about it.

The simplest thing I could imagine in 10 seconds would be to write your files in YAML and convert it to XML. A quick search of the nets returns this utility:

http://search.cpan.org/~ingy/YAML-0.35/bin/xyx.PL

which may or may not work, but if it doesn't just write one--it'll take all of 2 hours. As the first step of compiling, place this in your makefiles. Problem solved.

If this is too much, find or write an editor plugin that presents XML as YAML, you'll never see another

YAML was made to be hand typed--XML was mostly made to be machine to machine--you're right to feel uncomfortable editing XML directly--why would anyone do that?

Hey, as long as we're here--can I suggest a topic for thought? Those of us who are programmers have the ability to make our computers do anything (at very little cost). What is with programmers who don't feel it's proper to make their own tools?

Sometimes you have to make a little parser to change 30 paragraphs of repeated code to 30 lines of data. JUST DO IT! Move everything you possibly can from your code into data, then write something to input the data into your program.

And by data, I don't necessarily mean XML--data could be something as simple as a large string defined at the top of the file that you write a parser. In Java, array initialization happens to have a nice, short syntax--use that to get the data into your program. You can even put method names in your data and use reflection to link them at runtime--giving Java elegance similar to that of any dynamic language if you like.

Any code (no matter how large) you have to write just once beats the hell out of any code you have to write/modify any time you have additional data.

And if you have to use XML as part of your process--write a damn tool that takes the simplest input possible and outputs XML.. I doubt the initial designers of XML ever thought it would be edited by hand except in emergency situations. Before XML, it was just about always binary streams or binary files. Is that what you'd prefer to be using?

Bill on June 25, 2008 10:16 AM

I use XML, and don't consider it too hard to parse. If you use a decent editor, with proper formatting and it can be quite readable.

I never used to use XML, just doing name-value pairs, until I started to notice a pattern of problems I would run into, writing parsing and validation routines to ensure correctness of a file.

In Jeff's example how do you denote multiple fruits in the same file? You have to create rules in your parser to say 'Start a parsing a new fruit each time you see fruit as a name'.

How do you deal with a fruit having multiple toppings defined? Is the first one correct, is the last one? The default behaviour (depending on code) is to use the last value so you will survive if you don't think of this one. But this brings up another problem..

In Jeff's example how do you deal with having multiple 'bugs' on a fruit. Do you number them?
fruit=pear
vegetable=carrot
topping=wax
bug1=fly
bug2=aphid

If you number them you have a maintenance problem. If you just leave the name as bug, then you have to write a specific rule saying 'if the name is bug, creating a new bug on my fruit' otherwise you will only store the last bug.

XML can solve these problems by using a Schema and letting your language xml library code parse and validate the input.

But like Jeff said in the podcast, use each tool for the proper job.

Craig on June 25, 2008 10:21 AM

Sorry, but there is no sacred cow that has as much fervent devotion as SQL. Try to convince a group of DBAs that SQL is a bad language (which it is) and watch for unidentified flying objects.

Which should make tonight interesting for me. I'm going to a SQL Server usergroup meeting. :)

Chris Brandsma on June 25, 2008 10:52 AM

@Chris Brandsma

Great comparison! And similarly to XML, hundreds of programmers have decided that SQL is teh suck and have decided to wrap it behind a construct they like. In the end it's the same thing, some people want to work with tool x, so they do, others want to avoid x at all costs so they avoid it or wrap it or find a wrapper to make it "safer/easier/better" to deal with. Those that have embraced tool x look at those who use the wrappers as lazy (or terrorists), while the anti-x crowd sees the other group as stodgy troglodytes.

SteveJ on June 25, 2008 11:00 AM

I like XML for documents. You know, those things with lots of words and a little markup. Periodically I see someone make the groundbreaking rediscovery that XML can be just as well represented as S-expressions, and then instantly try to apply that to HTML. Which is silly, because if there's one place I *want* those big ugly redundant tags, it's in the middle of a document where it's otherwise easy for them to get lost in the noise. If I opened a tag five hundred words ago I don't want to have to flip back to the beginning just to see what tag this paren or close-bracket or whatever is closing.

On the other hand, using XML for key-value pairs is equally silly. Especially when it's a file that doesn't need any kind of well-defined i18n story.

Avdi on June 25, 2008 11:27 AM

Do you honestly think that this issue (naming of tags) *wasn't* discussed when XML was being standardized?

Rather than guessing what the XML standard bodies think, or simply assuming that they're just *not thinking* because they did something you find odd (how could they be so stupid as to not agree with you, after all), it might do to go figure out what the rationale behind the design decision actually was.

I do know this much: without named closing tags, it makes it much harder to verify the structure of a document until you reach the end of the document, and the parsing error is almost always going to show up later than the *actual* error.

Therac-25 on June 25, 2008 11:47 AM

Let me toss my cents into it.

First, XML wasn't made for humans or computers. XML is derived from SGML, which was designed to be a "Standard Generalized Markup Language". Ie, it was supposed to "markup" (add metadata to data), and to be sufficiently generic to be able to handle anything whatsoever.

XML was derived the following way: ok, we have this very flexible thingy. It's too flexible and complex to use. Let's get a subset which will be able to handle hierarchical data, which will solve a specific subset of problems we have.

For some reason, no one is allowed to do that anymore. Ie, say it's too flexible and complex.

Frankly, if your answer is "XML is here to stay, deal with it", then go do anatomically impossible things to yourself. Not everyone is happy with XML and XML will not pervade everything. Deal with it.

If your answer is "if you think this is too complex, add more complexity (eg, specialized tools, XSLT) to hide the complexity", then... ah, hell, I'll wait for understanding to dawn on you. Or not. Basic concepts can't be explained, and KISS is a basic concept.

If I have a table with many, many rows and a limited number of columns, I'll use CSV. If I have hierarchical data I'll use XML. I might use JSON or YAML too, but I'll probably settle on XML. XML can't handle non-hierarchical data, so I'd have to go for YAML or something else.

And if the data has to be accessed many times in many different ways, or be constantly updated, then I'll use a database.

I'll bet people who advocate "XML or death" probably won't see the possibility of using a language other than any of the mainstream languages (for their own definition of mainstream). It's a mindset, emphasis on set.

Daniel on June 25, 2008 11:50 AM

Hi Jeff,

I'm not sure, but why are you trying to look at XML code? Why not to open it in a web browser and have all the markup magically disappear, leaving only values? And if you supply some CSS, you may get a decent formatting for your data at little cost, e.g. output them in different colors.

XML may be an overkill for a simple task like storing a list of twenty keys and values, because it can do more, much more. It's really a powerful tool to keep documents that are both human- and computer readable. E.g. one can have an invoice that can be (with some help from CSS) printed and read by human and at the same time be precisely understood by a database application. And then it may be modified by some workflow app that augments the invoice with its own markers *without breaking the invoice*: it will remain printable and the database app will be able to read it just fine.

And then it may be transformed with some formula and the formula happens to be also a XML document. This gives us a nice closure that is not possible in, say, SQL: in SQL you can write a formula to take any tables and make any derivative table from them, but the formula itself won't be a table. So you cannot generate SQL code with SQL. With XML, that is XSLT, you can. That is an order of magnitude more powerful than SQL, and it works with XML documents, which are also much more complex structures than mere tables.

All other standards you mention may do a great job about keys and values, but they don't even come close to the full power of XML. And since XML can do all key-and-value stuff at little cost, there's no reason not to use it for this too :) In most cases this is simpler and more compatible because there are ready-to-use tools on nearly every platform.

Mikhail Edoshin on June 25, 2008 12:20 PM

Man there's a lot of morons in these comments. I guess that's what the smackdown model is good for: people lack basic reading comprehension so there's no point in trying to bring across _any_ nuance. Just where exactly did Jeff say XML sucks? Oh humanity, how I weep for you...

wds on June 25, 2008 12:35 PM

Ack! The never-ending argument! End the war! Peace. Love. and non-standard conforming markup!

Who knew that XML could arouse so much... sentiment...

Jeff Davis on June 25, 2008 12:36 PM

@Graham Stewart:
XML is wicked easy to pick at a medium level. Spend a day on w3schools (yes, that level is enough to start on) and with some books, and you will be at medium level. Try creating some files, try thinking about it and quickly you won't be making massive beginner mistakes. Because there aren't that many beginner mistakes you can make in xml - the language is too simple for that.
But of course you're right: people can and will make horrible abominations named .xml because they don't think about what they do. That is no different from any other language. And I'd say it happens no more often than in any other language - so I definitely wouldn't hold it against xml.

As for all the following technologies: no, you do not need to know xsd, xslt, xpath, xquery or any other language to make use of xml. Sure, they can help a lot and really add a lot of power to xml. But it's not necessary to know any of them before getting to know xml.

Regards
Fake

Fake51 on June 25, 2008 12:44 PM

"There is a very real mental cost to parsing even a few short lines of XML"

For you maybe, definitely not for me. MSBuild files, they really makes my head asplode...

"the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?"

So, what are you going to do with that time once you saved it? How is this metric useful?

Mike on June 25, 2008 1:14 PM

@Fanboy?:

That author lists a bunch of stuff that XML has, eg XPath, XSL.

That's great, but if he had ever learnt to use Lisp/Scheme, he would know that those extras are already part of Lisp/Scheme (syntax, macros, etc).

IMO XML is simply sexpr's and the rest of the XML technologies are simply a ripoff of what exists at the core of sexpr-based languages.

I read somewhere: + marketing = ()

I think that should be: () + marketing bs =

Cheers

leppie

leppie on June 25, 2008 1:16 PM

I recently had a case where a someone in my office needed to store a list of customer ID's to disk. There instant thought was to just serialize the collection in XML!

If we thing about it it makes what could be a simple CSV in to a giant file containing hundreds of String12324/String (not to mention all the data at the top of the XML). But the case is most people don't and jump for the quickest tool.

I believe most technologies are there for a reason and each case should be taken to choose the correct technology. I think XML has it's place but it should not be the default choice.

John on June 25, 2008 1:23 PM

What I am hearing from Jeff here is not: "Replace XML with this YNM which is always better". It's more like: "When you decide to use XML, make sure you know WHY you are using it, and please be aware that there are alternatives".
br
All the critisism about XML having feature X and standard Y and handles everything - that is valid and is a reason why XML sometimes is a good solution to a problem. It is also the reason why it is sometimes a BAD solution. Know the difference. Think, then decide.

OJ on June 25, 2008 1:28 PM

I'm amazed at the fuss some people make over readability. If you think XML is unreadable, try using something other than Notepad to read it.

I know XML is not ideal, but at least it means you don't have to worry about (a) parsing, (b) encoding all possible characters, (c) representing strongly-typed values. Pick any other format and you have to implement some of those yourself.

Chris on June 25, 2008 1:33 PM

For god's sake. Jeff, fix the comment form before making another article about XML. I keep seeing people comparing bar with foo=bar; my bguess/b is that they intended to compare foobar/foo with foo=bar.

And I'm not even sure if this comment will show up correctly.

Bleh on June 25, 2008 1:35 PM

And don't even get me started on XSLT...

I personally can't stand editing/reading XML, but I dearly love XSLT. It's a brilliantly designed language. I once heard it described as "the wonderful language with the horrible syntax".

I have actually written a DSL embedded in Python so that I may write XSL transformations without having to write XML, and I love it. I actually prefer it over nearly any template language, now that the XML pain is removed. Well-formedness guarantees are a wonderful thing!

Kyle S on June 25, 2008 1:41 PM

bleh -- fixing that now. It is really annoying, annoying enough to make me edit Perl code. That's how bad it is.

Jeff Atwood on June 25, 2008 1:48 PM

@Aaron G: Want to see a 2GB XML file?

http://setiathome.berkeley.edu/stats/

There are dozens of stats websites downloading the files in the above URL *daily*. I've been told from the developer of one of them that, during the daily updates, the XML parsing uses more CPU than inserting the parsed data into the SQL database. (although *querying* the database is the bottleneck the rest of the time; after daily updates are done)

Nicolas on June 25, 2008 1:49 PM

[?xml version="-1.0" encoding="UNICEF"?]
[procondocument name="my take on the good and the bad stuff with xml"]
.[list type="pro"]
..[arg]everybody can do it[/arg]
..[arg]global standard[/arg]
..[arg]it can be used for almost everything[/arg]
.[/list]
.[list type="con"]
..[arg]just because it can everything, it does not meen it should[/arg]
..[arg]DRY, with XML you repeat yourself over and over[/arg]
..[arg]terrible to look at[/arg]
.[/list]
[/procondocument]

Peter Palludan on June 25, 2008 1:50 PM

message mood=Yay!I have finally escaped HTML entities in comments/message

Let the XML-ization begin.

I'm sorry I didn't do this years ago. My bad.

Jeff Atwood on June 25, 2008 1:54 PM

I agree that XML is out of control. And, I think it is amusing that we all spent years fighting with SYSTEM.CONFIG and AUTOEXEC.BAT files that grew larger and larger (not to mention .INI files) until they became the monsters that OS/2 Warp used or uses.

So then we start getting smarter and using things like the registry and databases for settings...and then as you say in your post, the pervasiveness of XML in Visual Studio puts us back in the world of having to read through huge, confusing, hard to visually parse text files.

I'd rather have an .INI file without all the damn brackets!

Sam Schutte on June 26, 2008 2:09 AM

@Atario
Well, it wouldn't be proprietary, you'd use a *standard* binary encoding.

Tom on June 26, 2008 2:11 AM

Essentially :
I need a complicated editor to easily parse XML config files.
I need XML files because they work well with complicated editors.
Lovely.

Still, thank goodness for standards. In the olden days I had to do this :

width = 1024
height = 768

Now I can set the pixel width of my terminal window to a complex tree containing an annotated human genome and an embedded jpeg of a cat.

A reduction in ease of use is a small price to pay for that kind of flexibility.


AndyL on June 26, 2008 2:21 AM

Seriously, I wonder if it's mainly Windows programmers leading the XML charge?

In my experience Windows-only programmers are more likely to automatically assume that you need a special-purpose GUI for every little thing you might someday want to do.

AndyL on June 26, 2008 3:06 AM

@Sam Schutte: True, I don't use XML everywhere. For example simple ini-files are fine enough with key-value-pairs. I don't want to see huge amounts of config data in XML-files, I would rather put the data into a database, where data belongs in the first place.

Also simple data transfer files are just fine with comma separated values. But XML is good in general for structured data markup when you have proper tools for handling the files and when you want some more robustness into data transfer or such.

Silvercode on June 26, 2008 3:09 AM

The way I see it, is that it helps to SEE how the COMPUTER will read it.

Yopu can see the structure of the data, not how each element is displayed.

Jonny on June 26, 2008 3:13 AM

@Alex

Well, your format seems reasonable, but I wouldn't call them s-exps.

Moreover, what's with the closing parens on separate lines? :)

(control.model
..(control.model.A
....nb.chemicals = 1
....damping = 0.1
....nb.cell.neurons = 1
....nb.edge.neurons = 4))

(template
..(beam.template
....load = 6000.0 # newton
....radius = 0.001 # meters
....width = 2.0 # meters
....height = 1.0 # meters
....nb.hrz.patches = 8
....nb.vrt.patches = 3)))

Mikael Jansson on June 26, 2008 3:37 AM

The next time you're chossing NOT an XML file, just think about it.
That's all I'm saying.

titrat on June 26, 2008 5:29 AM

As far as I know now all the coders are too much busy with the DHTML. why you are still talking about HTML

Startlogic Review on June 26, 2008 6:17 AM

If you have a problem with an XML file, write an XSLT and turn it into whatever you want. Or use whatever API you wish to get the result you want. You can use any number of methods, and any programmer can come along and can easily manipulate what you created.

You have a problem with a text file, you can spend who knows how long cutting and pasting it into whatever you wish.

Fact is, Jeff, you're plain RONG. XML isn't difficult to mentally parse, you're too stubborn to conform to XML's syntax. Sure, it isn't elegant, but it's familiar, accepted and widespread. Having more people work in XML means that less people write their own arcane format that may or may not be the same as JSON or whatever easier format you're espousing.

This is a non-issue, you're making it worse by encouraging the regular joe-programmer types to go off on their own and make up their easier to read types that others will have a hard time understanding.

David on June 26, 2008 7:17 AM

The example is a toy, you really need a good example of XML gone bad:

exec program=${localPsExec} verbose=${verbose}
arg path=\\${target.name}/
arg value=-accepteula /
arg value=-w/
arg path=c:\windows\system32\inetsrv\/
arg value=appcmd /
arg value=start /
arg value=site /
arg value=site.name:${site.name}/
/exec

vs

${localPsExec} \\${target.name} -accepteula -w c:\windows\system32\inetsrv\ appcmd start site site.name:${site.name}

I love ant and nant but XML was a painful choice for a build tool. There has to be a better way.

Gareth Farrington on June 26, 2008 7:38 AM

XML is OK if you have the bandwith to spare: every line of code I don't write, don't foul up and don't debug makes me happy. Let me rephrase that: each line of code is the enemy, out to eat my guts.

When I want to read XML, there is Visual Studio and IE that can collapse pieces of the document. VS2008 is also very helpful if you have the XSD on hand.

I never type XML. I can't even memorize the ?XML ...? declaration. Or was it !XML ...!? Doctor Who cares.

Just check out /etc in any Linux distro ... Feel the Force of a unified syntax. XML is just one tiny step beyond the UNIX model of lines of text. It can be done in any syntax, XML won, and that's nice. I like exchanging data between many different systems effortlessly. I'm lazy that way.

Anonymous on June 26, 2008 8:55 AM

bugBrackets in names should be escaped too.../bug

joost on June 26, 2008 8:55 AM

A lot of people are saying that the point of XML is for it to be only rarely read by humans. Fine.

But the point of all the anti-XML sentiment, is people taking perfectly fine human-readable files, and converting them to XML for no good reason.

The Cargo-Cult that XML attracts is what gives XML its bad name.

AndyL on June 26, 2008 9:08 AM

How do I call an API in .NET?

htpp://msdn.microsoft.com

When in doubt refer to the /manufacturer's/ documentation.

It never fails... unless the product is genuinely unusable or unsupported.

:)

Jheriko on June 26, 2008 9:13 AM

i just got argued XML config files are better because they're mainstream, they have better support.
and of course, im developing a small (2-3 datapairs) config file, when asked what is company standard, they said xml. and it's not even a java/.net 'shop

rcphq on June 26, 2008 9:23 AM

Nice article and what a response, I especially like just write some XSL - a perfect reponse which, if you've messed about with XSL, does rather emphasize your point about mental effort. It has it's place sure, but isn't for everything.

I don't consider myself religious about this stuff, but the way you laid the argument out is one of those emporers new clothes moments, where someone is pointing out common sense.

Nice work.

ian on June 26, 2008 9:36 AM

A lot of commenters said basically what I said in one of the first comments. Standards are good, even if they are bad. Your job is to improve the standard so it can become better.

Hoffmann on June 26, 2008 10:06 AM

I prefer thingcar/thing to thing=car, because there might be more stuff inside the tag than just one word. And there might be =-marks too in the data. And showing the example in plain text is just lazyness, why not color it or use some tool and then show how easy it is to read XML after all?

Silvercode on June 26, 2008 10:26 AM

The Think picture looks like it was printed on an old, worn out inkjet then scanned in.

Is it XML in disguise?

John Baughman on June 26, 2008 10:59 AM

This small stuff matters.

Yes, the small stuff matters:
* How do you include escape the whitespace, newlines, and the lexical entities of your own markup (e.g., the equals sign)?
* Can it handle accented characters from across Europe, or the pathnames on an Asian localization of the OS?
* Is it possible to reference or embed one document/dataset/config inside another?
* How much overhead will the language incur to the next author or developer?
* What tools are available to the authors and developers (e.g., editors, validators, and APIs)?


I use XML everywhere BECAUSE of the small things. This is stuff that can be difficult to re-engineer down the road.

However, I do admit there is a huge usability need for better widely available XML editors. Short-term, I have no problem with people using simplified markup to XML conversion programs. This means in the cases that the simplified markup is unable to handle a situation, the XML layer still exists as an option. To the developer, the input language and APIs remain the same, as flexible as ever.

Long term, I'd like to see an XML editor that offers WYSIWYG editing and error highlighting, using a schema and limited, text-only stylesheet. Ideally, it should as freely available as a text editor is today. With this in hand, you could make the most common uses of XML look and act exactly like the simplified markup languages you suggest. This should compliment the range of editors for specific domains.

In the end, Jeff, I'm so surprised you keep beating on the angle bracket tax, while you accept HTML as the better general solution for Humane Markup Languages. In my mind, the issues are very similar: familiarity of language is huge usability win over having tens of different languages.

Anm on June 26, 2008 11:24 AM

It's intention is not to be human readable

Actually, I thought that was entirely the point. Otherwise you would just make up your own proprietary (mostly binary) format (like old-school Excel and Word files) and have done with it.

Atario on June 26, 2008 1:10 PM

First of all, I think YAML is too complicated. I better like JSON ;-)

But I also do think you can see XML as a step forward. When in the graphical Dataset Designer in Visual Studio 2008, I'm glad there's a textual representation in XML, so I can make changes in my favorite text editor. A lot of these tools were using some binary format in the past, so it's progress. (but that's not what this post is about...)

Doekman on June 26, 2008 1:19 PM

Jeff

XML is not beautiful. It may, however, be practical, but for some who value intrinsic Quality above practicality alone, which I'm guessing is you, it will never become beautiful no matter how much one may accomplish with it.

This entire post-plus-thread discussion reminds me of that book by Pirsig. Both sides of the debate are arguing from different value systems, in the sense of one side asking, Is it elegant? and the other replies, puzzled, Well, it fulfills all my specifications...

fwiw, I'm with you on this.

Caleb

Caleb on June 27, 2008 4:13 AM

@Atario
Well, it wouldn't be proprietary, you'd use a *standard* binary encoding.

Like perhaps ASN.1?

Suddenly I'm starting to see virtues in this XML thingy. :-)

T.E.D. on June 27, 2008 7:43 AM

I think you're missing a detail: XML is meant to be read and edited
by humans only as a last resort. The normal situation is to edit
and consume them using tools written for that purpose.

So in addition to the more complicated files I have to create, I also have to write, debug, distribute, and maintian a bunch of tools so meer humans can comprehend them? Sounds like a lot of work to me.

T.E.D. on June 27, 2008 7:53 AM

The funny thing is that my most recent experiment with configuration files went totally the opposite way of XML. OK, it was almost 10 years ago now, prior to the XML brouhaha, but I don't exactly create config file formats from scratch every day.

Instead of creating text files that you really need special tools to maintain, I tried to make the syntax proper english. Here's an excerpt:
-----------
###########################
## Simulation Clock Rate ##
###########################
Run the clock at 240 hz

#############
## IO Band ##
#############
Schedule IO enabled at 60 hz plus 3 cycles

Run the JPATS_IO.scheduler enabled in IO

################
## 60 Hz Bamd ##
################
Schedule 60hz0 at 60 hz

Run the JPATS_auto_test.scheduler in 60hz0

Run the JPATS_ios_pilot.scheduler in 60hz0

Run the JPATS_DCLS.scheduler in 60hz0

Run the JPATS_Aircraft_Body.scheduler in 60hz0
-----------
I'm not saying this is perfect. It has its drawbacks. But at least its something the wrench-turners who maintain the system can fiddle with using the editor of their choice and without having to call in a software person.

T.E.D. on June 27, 2008 8:08 AM

I have become very fond of XML, but not because I try to read it myself.

I mainly like it because of the amazing ease of serializing and deserializing complex and hierarchical classes in .NET. To me, this means that I can model quite complex real-world entities as classes, then serialize them to disc (or package them into an MSMQ message) in a couple of lines of code, and deserialize them just as easily. Having to write code to PARSE any other representation of such complex objects would be a nightmare.

That said, I guess I don't really care whether .NET uses XML, YAML, JSon or Transylvanian to represent these classes. Who cares, so long as it works? If I have to read this stuff myself, why do I have a computer?

David on June 28, 2008 4:54 AM

The web is to be doomed with technologies like XML CSS built into HTML and others. It can be be very slow on older PC, or ultra mobile Devices. The time it takes to parse threw the code to render a website. No wonder why web-browser take up so much memory / CPU time to display that has a simple output. The structure is getting complex. I’m not saying there no need for the technology, just coders have gotten lazy with recycling too much code. Create a few more templates, less parsing that code out of some complex xml, CSS structure. It like having an application fully unload its self on launch instead of creating it and destroying stuff when looking at it. If you use it, use it right. I’ve have not seen this yet with most programmers who use it.

Ken on June 29, 2008 1:35 PM

I just learnt: you *don't need* XML, JSON or YAML for configuration. You can do it in code:

a href=http://dojo.ninject.org/wiki/display/NINJECT/Modules+and+the+Kernelhttp://dojo.ninject.org/wiki/display/NINJECT/Modules+and+the+Kernel/a">http://dojo.ninject.org/wiki/display/NINJECT/Modules+and+the+Kernel/a">http://dojo.ninject.org/wiki/display/NINJECT/Modules+and+the+Kernelhttp://dojo.ninject.org/wiki/display/NINJECT/Modules+and+the+Kernel/a

Doekman on June 30, 2008 3:33 AM

I love XML for the tools and standards surrounding it. I use XML schemas to validate my data on load. I use XPath to write concise code that navigates complex data. I use XSLT transform my data into other formats or pretty HTML. I use XML signatures (http://en.wikipedia.org/wiki/XML_Signature) to authenticate my data.

Sometimes, the problem really is just a nail. Use your hammer.

wcoenen

wcoenen on June 30, 2008 4:12 AM

I hate it when XML is used unnecessarily.

Google adwords for instance, I ask for a csv of a few months of analysis data. Google (Google I say!) then say to me during the export 'The file you have asked for will be quite large, here is the file as xml'....wtf! The file is going to be 3-4x as large. You'd think Google, of all companies, would understand the overhead.

gary on June 30, 2008 5:38 AM

Guys, I think we need to rethink fire.

Anonymous on July 1, 2008 11:15 AM

Anyone who disagrees with this post should have to read a 1000 line page of Coldfusion. Then maybe you'll rethink the wow, we should use (angle bracket based language) for this!!.

cfstupid on July 2, 2008 10:05 AM

An alternate and often better application file format is the SQLite database. This database frees you from worrying about how to store your file, and changes to the file are made consistently in page-sized chunks. SQLite is more robust than XML when power failures occur because it has rollback journals. SQLite has a query language called SQL, where you can define your data on your terms and not worry about angle brackets. You can store anything from integers to strings to blobs. You can program constraints and triggers. SQLite is a library with no server. You link in the DLL, add a header file or language-specific database driver, and off you go. The DLL is only 300 kbytes, and the data tax is about 2 kbytes plus double your data. Most file formats would be hard pressed to produce those kinds of numbers. The API has primitives to open and close the database, and primitives to prepare execute queries and to retrieve results. SQLite is also relatively independent of language and OS because it runs on multiple operating systems and has drivers for most popular scripting languages. Check out http://www.squidoo.com/sqlitehammer#module6512724 for some of them.

I have a page at http://www.squidoo.com/sqlitehammer#module7336484 that shows how to use SQLite as an application file format.

Jay Godse on July 4, 2008 7:24 AM

Guys, the U.S. dollar is too hard to read, we should each come up with our own currencies.

Anonymous on July 5, 2008 11:17 AM

At the time when baud rates were 9600, memory was small and expensive, assembler language was still an option (or a must!), many programmers, if not all, felt the need to save resources. That need leads you to know what you are doing at the very deepest bit level. The same as living after a war: you cannot waste a bit of anything. Coding with a spectrum or a XT required packing many information in few space.

Now you have plenty of everything, you can waste if you want. But who can say that thinking about how the things work is bad? It's a good sport thinking of how much of a XML file is redundant. Why not? Even if you love it, even if it's your religion, even if it's the only way you know to store data, why not think of it? I agree with Jeff.

oscar on July 6, 2008 5:58 AM

Jeff, you might have already read this in today's news:

http://www.pcworld.com/article/148054/google_opensources_data_exchange_language.html

harpooner on July 8, 2008 11:38 AM

Jeff, I noticed that a lot of junk got added to the web.config in version 3.5 of the .NET Framework to support AJAX.NET and LINQ. Frankly, it's a complete mess. It seems like 90% of it is related to AJAX.NET. I don't use AJAX.NET and I don't think you do (for StackOverflow). Are you planning on removing all the additional junk?

I actually don't understand why Microsoft didn't come out with a new real version of the .NET Framework that includes AJAX.NET and LINQ. I suppose that there are reasons, but it seems like it would be easier to have the new version and a nice clean web.config.

Ben Mills on July 24, 2008 2:04 AM

Badly need your help. All human situations have their inconveniences. We feel those of the present but neither see nor feel those of the future; and hence we often make troublesome changes without amendment, and frequently for the worse.
I am from Mexico and too bad know English, give true I wrote the following sentence: Create mp ringtones create mp ringtone from any audio file on your local pc or free ringtones download simply browse for the media file you would like.

Best regards :-D, Dixie.

Dixie on March 27, 2009 8:15 AM

Reading raw XML is as pleasurable as picking apples recursively.

Paul Goddard on April 15, 2009 11:50 AM

Jimmy - other way around. XML doesn't come from XHTML - XHTML is a derivation of HTML conforming to strict XML rules.

Simon on February 6, 2010 10:37 PM

Again with the YAML and JSON. Yes, fine, these are technically standards, but they're incomplete ones. You can't validate them. They don't have well-formedness. They don't have schema. They don't have metadata. Maybe you don't need any of these things, but in cases where you do, YAML and JSON are not "alternatives" at all.

To the guy talking about XML files spanning hundreds of MB or even GB, where have you seen this? I don't think I ever have. And in those cases, just how much bigger is it than, say, JSON, and is there any difference when you compress it, which you should be doing with gigantic data streams anyway?

If this is being used as an actual storage mechanism then of course you should be using a relational database, but I don't think there are many people arguing for XML files as a replacement to SQL Server.

Yes, there are alternatives to XML. There are also alternatives to .NET. And there are alternatives to Microsoft Windows. I choose not to use those alternatives because the de facto standard is a lot more convenient for me. Minimalists, this is the 21st century; if angle brackets are too much for you than maybe you should also be dropping Unicode and 4-digit years.

Aaron G on February 6, 2010 10:37 PM

I am not sure if this discussion is really necessary.

You don't like XML for the reason that you find it hard to read. So fine, try to avoid it where possible. But to argue that it is hard to read in _general_ and for everyone is probably not the wisest thing to do.

I for one find it really nice to have a closing tag repeat the tag name because it makes finding matching pairs easier. And there is probably a reason why a lot of programmers write comments repeating the function (or whatever else) name at the end of a block.

And as a lot of people already said: XML covers a lot of special cases and tries to give a solid solution to almost any problems someone could encounter while creating some kind of document. It provides means to do very simple and very complex things with it, which, in my opinion, makes it really suitable for many use cases.

Of course there are cases (like the popular key-value lists), where a simple format is better. But on the other hand I have seen enough configuration files, where some weird syntax was introduced, because they need "just a bit more than that". If you can keep it plain and simple, do it. If you don't know, better use something that is extensible in a sane fashion.

@Someone who talked about hating XML but loving XSLT: Do you realize that XSLT is an XML application? XML isn't something in itself, it only provides a standard for creating specific markup languages that fit a problem domain.

I think most of the people who say "I hate XML" actually mean "I hate a certain kind of XML".

In the end it is actually a matter of taste, just like programming languages are. Or who does _really_ always use "the best" tool for the job? (Which can also mean the tool you are more used to)

Simon on February 6, 2010 10:37 PM

One comment I'll make on the subject (anti-xml) is depending on the requirements and scope of your project, you may not be able to use those nifty XML parser and writer libraries that are available to you. As a Game Development Student, we created a toolchain which created content for a game engine in XML, this was nice and easy in C# however, in the actual game engine (C++), we were forbidden from using any third party API or Library, so we had to manually parse these XML files, which completely negated the key advantages of the XML format. Namely, if we had a fixed-format that we had to parse manually line by line, character by character, plain text would have been simpler and would have resulted in much smaller files.

Brandon K on February 6, 2010 10:37 PM

I think Jeff said it best (to summarize):
This is not a religious debate until you encounter the comments, Jeff is only pointing a finger at the elephant in the room and saying: "Check it out people!"
Not "PEOPLE WATCH OUT YOUR GOING TO BE TRAMPLED BY IT"

So go look. Spit at it if you will, but shouldn't we all understand the childhood saying "Don't judge a book by it's cover"?

I like Fanboy?'s final comment:
"I'll stick with Xml. It works, so I don't have to."

And I will extend the statement to include:
"I'll stick with an open mind. It expands, so I don't have to worry about missing something great in exchange for contentment"
!Kaizen people! http://en.wikipedia.org/wiki/Kaizen

PersistenceOfVision on February 6, 2010 10:37 PM

"Why don't we think about what we're doing?"
Because there is a very real mental cost to thinking about it.

"But, we also tend to radically underestimate the impact of the dozens of small events in our lives throughout the day." Like missing a small detail in the requirements doc (we'd be so lucky!) that emulates the flap of a butterfly's wing?

"I highly recommend reading it..." Nah, mentally pricey!

"That's all I'm saying." Glad to hear it.:-)

XSD rocks!

Simon Parmenter on February 6, 2010 10:37 PM

LOL. I can't help but laugh and pity at all these religious ducklings.

They all think you're attacking their mother, Jeff!

Jon Limjap on February 6, 2010 10:37 PM

Some people don't use XML correctly, but that's no reason to throw the baby out with the bathwater.

If you will never have a reason to edit your information without a tool and performance is important, don't use XML. But as a nexus point where something is both human-readable and machine-readable, with all the edge cases like embedded CRLFs and equals signs considered, XML isn't bad.

Misusing any technology provides a case against it. I'd be curious to see someone argue against a best-case use of XML.

Jason on February 6, 2010 10:37 PM

What I don't like about JSON is that it does not even support comments. I won't consider it for any kind of configuration files which requires annotations. Does YAML support comments?

anonymous2010 on February 13, 2010 10:36 PM

«Back

The comments to this entry are closed.