Occasionally I'll write about things that I find sort of mildly, vaguely thought provoking, and somehow that writing turns out to be ragingly controversial once posted here. Case in point, XML: The Angle Bracket Tax. I'm still encountering people online who almost literally hate my guts because I wrote that post. You'd think I kicked their dog, or made inappropriate romantic overtures toward their significant other.
Well, first of all, we are talking about XML the markup language, not XML the religion, right?
I hope so. I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it. This doesn't mean I can't be enthusiastic or critical of those tools and technologies, but I'm not married to the stuff either way. Who needs all the emotional baggage?
Obviously I failed to communicate this before. I talked about this a little bit on Stack Overflow podcast #5 with Joel, where I tried to amplify and explain my position a little better.
I wasn't trying to present it as "Oh, XML is bad, let's all switch to this new markup language that all the cool guys are using". What I was trying to say is why don't we think about what we're doing? That's the general theme of a lot of the stuff in my blog. Can we just stop programming for a minute to think about what we're doing and not make a blind choice based on "Well this is what my tool does, so that's what I have to do"?I think obviously there's pros and cons to each. I'm not saying that one is the right solution all the time. But I think, ironically, that is what is happening with XML. I think people are saying "It's always the right answer, because it can store anything, right? And all the stuff I use uses it, so it must be the right choice for everything." That bothers me a little. Maybe I'm just contrarian. Maybe I'm an iconoclast and I want to try different things and see different things, but I think actually understanding the alternatives helps you understand XML better, a little bit, too.
And I hope people reading my blog would not get the idea that it's about a knee-jerk reaction one way or the other. It's about understanding the tradeoffs and applying those tradeoffs to your particular situation. I think that is the absolute art of programming. It's understanding what you could do, and which one of those things fits your situation best. Versus what so many programmers do, which is "I've learned to use a hammer, and I'm gonna hammer everything." Ultimately, to me, it's about self-awareness.
By the way, I'd like to thank everyone who pitches in to make those Stack Overflow podcast transcriptions possible. It is because of your generously donated time that I am able to quote that audio here.
I don't post stuff to push people's buttons, I post it because I want programmers to think about their tools, their technologies, their methods.
If what I post here seems unnecessarily confrontational sometimes, a far smarter person than myself said it better than I can:
I blog to help others and also to learn. As it turns out both are aided by getting folks to actually read the stuff. Please pardon the necessary devices.
Please do pardon the necessary devices; I find that I often learn best through the smackdown learning model. That works for me. Maybe it doesn't work for you, and that's OK. There are millions of websites to choose from.
That said, I do actually have a problem with XML, or I wouldn't have written anything in the first place. I think there's a real issue here that is, for the most part, being completely ignored. XML fever may not be as debilitating as, say, Dengue fever, but it has side effects as well.
Consider Norman Walsh's Defending the Tax. Norman is an XML Standards Architect at Sun.
On the other hand, the difference between:
fruit=pear vegetable=carrot topping=waxand
<doc> <fruit>pear</fruit> <vegetable>carrot</vegetable> <topping>wax</topping> </doc>isn't really that large, is it? (Or maybe you think it is, de gustibus non est disputandum.)
The de gustibus dismissal means Norman considers it is a matter of taste, but it isn't. The difference is large. There is a very real mental cost to parsing even a few short lines of XML.
As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project. Every time I look at my web.config XML file, there's a mental cost of me having to parse all these tags in the file. Here's this tag, which lines up with this tag. Here's this giant, verbose thing where only half of it actually matters.
Sure, it's a small effort. Insignificant, even. But what's the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?
I also posit that these minor headaches may be more significant than you realize. In Stumbling on Happiness, author Dan Gilbert makes a similar assertion.
His research found that people are bad at predicting their own future happiness. They tend to radically overestimate the positive or negative impact of large events in their lives -- losing your job, getting rich, getting divorced, having children. That's generally good; it means we have defense mechanisms in place to adapt and survive in our changing circumstances as human beings. But, we also tend to radically underestimate the impact of the dozens of small events in our lives throughout the day. Thus, small injustices don't trigger our defenses. The effect of that squeaky screen door, the neighbor's barking dog, the interrupting telephone call -- all of these may have far more profound cumulative impact on your day to day happiness than you realize.
It's a fascinating book, and I'm only paraphrasing the smallest part of it. I highly recommend reading it if this is at all interesting to you. It won't exactly unlock the secrets to happiness, I'm afraid, but you may gain a deeper understanding of why we tend to make the choices we do in our neverending pursuit of happiness.
I'm not trying to change the world overnight, but I wouldn't mind planting a few seeds of dissent in people's minds. This small stuff matters.
The next time you're trying to figure out an XML file, just think about it.
That's all I'm saying.
| [advertisement] Peer code review without meetings, paperwork, or stopwatches? No wonder Code Collaborator won the Jolt Award. |
Posted by Jeff Atwood View blog reactions
« The Ultimate Code Kata Smart Enough Not To Build This Website »
Hi Jeff,
I presume you're using Web Forms for your views on Stack Overflow? Was this a considered decision? Did you consider any of the alternatives such as NHaml?
Cheers,
Andrew.
Andrew Peters on June 24, 2008 05:54 PMOne thing I really like about your blog is your quotes on other people. I find those quotes to usually been the most reliable part of your posts, since not just you agrees with the guy you quote makes the information more reliable. Quotes are for highlighting points about a topic not about placing parts of other people (or even yourself in a previous publication) text into your own. That said I think you should have taken that part of your podcast rewrite it and place as normal text.
About the topic I find that it mostly does not matter, even more, I think it's better this way because it's standardized. Standards are good, even if they are bad. A stantard is better than each one having it's own language and it's own parser and so on.
Hoffmann on June 24, 2008 05:55 PMElaborating a bit on the Dan Gilbert book:
"The Futile Pursuit of Happiness"
http://www.wjh.harvard.edu/~dtg/Futile_Pursuit.htm
I hate to read XML, but, as Hoffman said, its standardized and almost any dev CAN read it. While it may benifit you to use something else, it will hurt the people who have to read that later.
I do agree that you need to be sane about what you use, but when its only a small difference I say take a hit for the rest of the community.
The other thing XML has going for it is LINQ-to-XML and XML literals in vb.net. There isn't a much easier way to write/parse data in .net.
I'm actually using this summer to teach myself more about XML and what it can actually do for me past a few simple scripts for websites. Can anyone recommend me a good book for learning XML?
Mike on June 24, 2008 06:38 PMHi Jeff. I've recently defended your blog to a fellow programmer who falls into your "passionate hatred" camp, and he's not the only guy I know who thinks similarly. On the other hand, I know a lot of people like myself that recognise you're just another bod. Personally what draws me here isn't the technical excellence, but the fact you can repeatably string consistent articles together that can be read by the average joe programmer (me).
I think you're overstating the mental parsing problem just slightly, and would almost dare to posit that if you can't substitute "fruit=foo" with "<fruit>foo</fruit>" after a few years' of Visual Studio, then you're probably in the wrong business. ;)
Another aspect is that this could be seen as a tools problem. For the past year or so I've reached the opinion that for any formally structured data, there almost certainly exists a more efficient, "humane" representation that should be implemented in a GUI for manipulating that data structure.
While there aren't such things around (yet) for things like C or C# code, there exist quite a few XML editors that implement a number of different graphical interfaces to viewing/editing XML. The beauty in the generality of XML is that a user/programmer is free to pick from any number of different representations that he may use to manipulate the Infoset. The textual tags are just one widely used representation.
David W on June 24, 2008 06:42 PMI'm going to share with you the first paragraph of Simon St. Laurent's 1998 "Why XML" article:
"The computing press has found a new savior for the ills that afflict computing and the web: XML. XML is new, it's exciting, and it's got to be good, because the specification for it looks indecipherable. XML's hype level has already drawn fire from some quarters, from those accusing it of 'balkanizing the web' or of increasing the load on an already strained Internet."
10 years already. It does seem indecipherable at times (especially when you're dealing with large XML content).
Here's the link: http://www.simonstl.com/articles/whyxml.htm
Elvis Montero on June 24, 2008 06:49 PMWhile xml is horrible to read and isn't going to make anyone happy, using anything else is likely to make someone seriously unhappy. Have you ever tried parsing a csv or similar proprietary file which has documentation that not only was lost years ago but didn't handle the data type you are trying to add anyway?
The mental cost of reading xml is far outweighed by the benefit of being sure that you will be able to read it. Definitely a case of worse being better.
Tom Clarkson on June 24, 2008 06:50 PMYou should go one step further and prove your tax. Write a program using Visual Studio that yields the same result with each dataset. Wouldn't it be safe to say that if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?
Tim on June 24, 2008 06:51 PMXML is generally excise. (Doesn't About Face have an entire chapter on this?) When XML is presented to humans as the main means of modifying data or software state, you should be using something different (i.e., an actual UI). That said our content management system wouldn't exist without XML and XSLT, and I love both very much. Our users are none-the-wiser, however. XML is the pain that software developers bear so our users may lead happier, healthier lives.
Kendrick Erickson on June 24, 2008 07:00 PMOnce again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable - the verbosity that is so annoying to a human brain (because we can interpret the meaning from context) is absolutely essential for software. Thus, I think the solution would be a translation layer for human viewing/editing of XML files. I'm sure that XML viewers/editors already exist (a quick Google search shows that they do). Maybe you should give one a shot? If you can get a plugin for Visual Studio, the entire problem would be solved.
Bill Gates on June 24, 2008 07:02 PMMy problem with XML isn't as much the strain of having to read it; it's more of how bloated it has become.
If I recall correctly, XML was derived from XHTML, which has it's basis in HTML. So, in theory, XML is really just another text markup language. I'm not going to argue with the ability to create your own markup tags that can be parsed to mean whatever you want them to - quite the opposite, in fact. That feature is (hands-down) the most powerful aspect of XML.
Unfortunately, when you give people that much power, it inevitably goes downhill. Think of what XML was intended for (custom text markup), and now think about what it is being used for nowadays (configuration files, data transmission, data persistence, reporting, etc.) How much of the "usefulness" of XML is due to the ability to throw whatever you want into a file along with the rest of a loose collection of information, which might not even be relevant?
This doesn't even begin to take into account the extra overhead associated with parsing, reading and writing the information as you said in you previous post. Add that into the mix, and (to me, anyways) the case against the widespread proliferation of XML grows stronger with each opening and closing tag.
So the question I pose is this: is the advent of XML as a universal data type (for lack of better wording) making us better programmers, or is it causing us to slide backwards into the olden days of placing everything having to do with anything into one place for "easier" access?
Jimmy on June 24, 2008 07:07 PMyou must love wpf ;)
brian on June 24, 2008 07:16 PMJimmy - other way around. XML doesn't come from XHTML - XHTML is a derivation of HTML conforming to strict XML rules.
Simon on June 24, 2008 07:25 PM> Hi Jeff. I've recently defended your blog to a fellow programmer who falls into your "passionate hatred" camp, and he's not the only guy I know who thinks similarly.
"The dogs bark: a sign that we're riding, Sancho". (Don Quixote, via Jorge Diaz Tambley)
> Once again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable
I desperately wish someone would explain this to all the people writing XML files. Oh wait, we have.
> Wouldn't it be safe to say that if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?
The idea that there's only two choices: XML or "write it all yourself" is sort of.. a lie.
YAML is based on a standard, too:
http://www.yaml.org/
The power of software development is that it is one of the most efficient methods of expressing our will. Once it was people being taught a process, then it was mechanically expressed in assembly lines, after that we had hard wired chips and now it has moved into software. But, no matter how this has changed, it has always been about the best method to express our will and the backbone of that is passing information efficiently. It isn't about XML, Corba or whatever... as sure as XML is a certainty as a format to store data for the next 100 years, in 20 years we'll look back and laugh. I think a good phrase here is, "Every 1000 years, the followers of the current mainstream religion look back at the followers 1000 years ago and ridicule them". The difference for us, is that we see multiple changes like this within our own lifespan and yet, when we're stuck in the middle of the current new fad, we lose perspective and somehow forget about the last 10 technologies which were the promised golden bullet.
David from Oz on June 24, 2008 07:32 PMHow about the mental cost of learning the syntax of a bunch of new parsing languages? YAML, ini, bleh. I already know XML, why would I care to learn additional mechanisms for storing configuration/data persistence?
How about the anguish of working with immature and buggy APIs that parse these languages compared with the proven and stable apis that are built into Java/.NET? I don't need an external DLL. I don't need to unit test that piece of code. With XML, it just works right out of the box.
How about training costs? I lead a team of 5 engineers. I have not had to explain XML to a single one of them because they have either known it coming in (due to the pervasiveness of XML and .NET) or they were smart enough to look it up on the internet. Can you say that for the configuration flavors of the month your propose.
Jeff, I think your frustration comes from a lack of tackeling enterprise level apps. These rants are starting to sound like Joel not likely Exceptions or the need for a new language. You are so overly concerned with the little details that you miss the bigger productivity picture.
If you think about it from a Domain Driven Design (http://en.wikipedia.org/wiki/Domain_driven_design) perspective, XML is just a persistence layer. It's unimportant and you shouldn't be spending time on it. Focus on what matters - the domain.
Jim Greco on June 24, 2008 07:33 PMThe passionate hatred reminds me a little bit of some Firefox fans, for example. Don't get me wrong -- I use Firefox and I'm happy with it, but I don't get into any heated discussions about it. It's just a browser. But a quick visit to some random web forums, and you'll inevitable see people turn into raging lunatics when they talk about how much better it is than IE, and how dare anyone say anything bad about it (or abbreviate it the wrong way, for that matter).
I know the word has become trite, but fanboyism is probably the best way to describe it. Whether it's XML, or Firefox, or Ruby, or Linux, or Microsoft, or whatever. Use whatever you want--there's no reason to feel threatened when someone else prefers something different. It seems as if a great deal of people are either insecure about their tools and software; or perhaps they consider it so much a part of their own identity they feel that a criticism of their tool is a personal attack.
Whatever the reason, that kind of reaction to your original post certainly speaks volumes about a person's maturity level.
Neil (SM) on June 24, 2008 07:34 PM> I already know XML, why would I care to learn additional mechanisms for storing configuration/data persistence?
I might ask you a similar question: why learn anything beyond exactly what is required?
I'm not proposing that everyone stop and rewrite every application written in the last 5 years, merely that people understand and are aware of the alternatives.
Jeff Atwood on June 24, 2008 07:43 PMNo one's mentioned them in this comment thread, but they're inevitable so I feel I should get it out there this time: Lisp S-expressions offer all the standardization and consistency of XML with far less syntactic noise. S-expressions were also conceived as a "machine format" as opposed to a human format, but they are eminently more usable. Why they're not in wider use these days I have no idea.
Not much to say here besides that, but really - they're easier to parse and generate for both computers and people. They're lighter-weight and at least as extensible. Coincidence is not a good enough reason to maintain the use of XML over simpler, saner formats!
Isn't XML just another "bug-ridden, slow, ad-hoc implementation of half of Common Lisp" with better marketing?
I really don't mean this as a troll, I'm just so dissatisfied with XML that I react strongly when the topic comes up. Sorry!
JoeOsborn on June 24, 2008 07:50 PMIt's just a standard like anything else. It's also very expressive, easy to use, and has loads of tools support out there.
Why on fords green earth would you hand-edit Xml or Html just to write a document, a post, or a comment on a website. It's better as a structured storage format, that's coincidentally really easy to send across the wire (because it's already text). How your users input data usually doesn't have anything to do with the storage format.
Instead of the argument being 'Use Xml', perhaps the better argument should be 'Why Are You Not Using Xml'. Why are you trying to reinvent the wheel (and by reinventing it, wasting your time, and wasting the maintainers time and everyone else's time).
Xml is not a UI, so Xml haters, come up with a better argument.
Maybe us 'xml fanboys' react because the alternatives are only superficially better. Maybe we don't want another markup language for the sake of it. Maybe we don't want to change existing code to suit the flavor of the month.
I have no doubt that Xml will be surpassed in time, but until there's something better out there, I'll stick with Xml. It works, so I don't have to.
Fanboy? on June 24, 2008 07:51 PMXML took off because it was the first simple, recursive generic data format which is both machine and human parseable/editable.
That's it - there's no magic to it and it's certainly not the best solution many times but it can be fitted to near any need, thus its omnipresence in VS.NET and other enterprisey systems.
XML allows you to build DSLs really quick and simply. The user does not want to edit Java or C# code to configure their application but having them edit XML is often acceptable since its syntax is much simpler. And that's why XML is rarely ever used in a language like Ruby - Ruby can be made clean enough that the user doesn't even know they are editing Ruby. Only the quotes below are a clue you are in a programming language.
Ruby:
hostname 'http://foo.com'
port 50
XML:
<hostname>http://foo.com</hostname>
<port>50</port>
So to great extent its all about cleanliness and ease of data expression in the language you are using.
More here:
http://www.mikeperham.com/2008/02/09/dsls-and-xml/
I'm definitely not trying to shill for Ruby here. All of this applies to any other language with lightweight syntax requirements so the code can be made to look very close to English.
Mike on June 24, 2008 07:51 PMThere's nothing wrong with XML. If you don't find it legible, take an hour and write a program to display the data in whatever format you want. It's not going away, so deal with it! Another lame post.
Josh Stodola on June 24, 2008 07:52 PM> I might ask you a similar question: why learn anything beyond exactly what is required?
For low-hanging fruit such as data persistance and configuration you should absolutely not care about going beyond the minimum required because these frivilous things don't make your app better.
What are the benefits? A little less pain in deciphering the meaning of <foo>bar</foo> vs. foo=bar? At the cost of...
* Additional training
* Buggy/immature apis
* Unknown performance
* Mental cost of switching between XML and the flavor of the month text file format
You mentioned a lot of topics on this article, but didn't get to discuss any to any interesting extent. Paradoxically, it was a nice read.
Diogo on June 24, 2008 08:02 PM@JoeOsborn ... here's a link you may like:
"XML is not S-Expressions"
http://www.prescod.net/xml/sexprs.html
Anyways, most of the time I'm just using Xml files as an easy way for end users to have complex configuration files without me having to come up with a novel way of representing them. It's so trivial to write a class X, populate it the way you like, and chuck it into an generic Xml Serializer (or load it with a generic deserializer). Job done .. next.... I never even have to write the parser or parsing code.
Fanboy? on June 24, 2008 08:04 PM"YAML is based on a standard, too"
That brings up the question: What exactly is a standard? I always thought it was something that a large number of people have agreed upon.
With XML, it's the XML Core Working Group, part of the W3C, a multi-national consortium with a large member base, including quite a few major tech corporations. The copyright for the specification is held by the W3C.
With YAML, it's... the yaml-core mailing list. The copyright for the specification is held by three individuals.
Now, XML is a pain in the ass to deal with, but... would you really consider the YAML specification a standard?
Powerlord on June 24, 2008 08:26 PMI think that in the case of XML, having a standard at all is much more important than having the best standard possible. The cost of getting the entire industry to convert to a less verbose standard for document markup would be far greater than just dealing with the angle bracket tax. It's a case of: just pick one and be done with it, so we can get on with our jobs!
Ben P on June 24, 2008 08:26 PMMany people are complaining about the “mental cost” of (manually) reading XML.
Come on! If you really dislike reading XML, can’t you write (once) a simple XSL transform that will convert any XML document into some format that doesn’t burden your mind so much?
If you can’t write that transform, then the data probably couldn’t have been stored in your “light on the mind” format anyway, which means that XML was a good choice.
I never really understood the fuss about XML tags. For me, opening and closing tags are simply a concept. Sure, currently we often go on and store and transmit those tags verbatim, which *is* wasteful. But with some simple tools we could process and operate on XML and tags at a high level without those tags having to exist physically at a lower level.
Edward on June 24, 2008 08:40 PMIf anyone want's to see a classic example of angle bracket tax, try reading and writing XAML, it'll make your eyes bleed.
Ian on June 24, 2008 08:43 PMI'm with ya Jeff. I'd go so far as to say I hate working with XML. As such, I keep my use to the bare minimum and usually will look at things like JSON, etc, before I'll settle on XML.
And I think that's the point you're making. XML all the time is just bad behavior.
Do you really need XML to define your ORM mappings, or would a compilable fluent interface make more sense?
Is there a real need in your application for XML configured Dependency Injection?
Does your webservice really need to return a complex and strongly typed XML file, or would a JSON file work just as well?
I could go on.
And don't even get me started on XSLT...
Lucas Goodwin on June 24, 2008 08:47 PMI have no problem with using XML for relatively small buckets of data - config files, individual transactions, etc. Anyone who uses XML for large buckets of data is drinking the kool-aide or smoking crack.
The problem with moving or storing large amounts of data via XML is that there is no easy way to locate subsections. Anything you do requires that the XML document be parsed from the very beginning. This isn't a big deal when you are dealing with a small document. A few k of characters can be parsed pretty quickly whenever you need it. When the document gets a little bigger, you parse it once and use the DOM model. What happens, however, when you need to process a document that contains several hundred MB of data, or even several GB of data? There is no practical way to handle these volumes. And yet I see people try to do this all the the time.
While record oriented techniques are clumsy in certain ways, they are far more easily scaled to large data sets. It is trivial to navigate to any particular element in a fixed field file, no matter how large. A csv file is almost as easy to handle. The data sets can be broken down into subsets quite easily. These sets can be streamed or paged easily. Arbitrary parts of the data set can be accessed without accessing every other part.
XML is perfectly fine for a lot of things, but programmers should consider scalability in the context of their projects. XML doesn't scale nearly as well as traditional record oriented approached to data.
RevMike on June 24, 2008 08:51 PMI have no problem paying the "angle bracket tax".
Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes.
What happens when you start having to handle non-standard data? E.g. things need to fit on multiple lines, or contain odd characters? Then you'll have to invent an escaping or encoding scheme.
Newsflash, XML already has this. Why reinvent the wheel? I'm sure as hell happy I don't encounter custom CSV or other arbitrary delimited file formats that much any more, since 99% of programmers don't think about the exceptional conditions, and their crappy invented file formats can't handle them.
nexusprime on June 24, 2008 09:29 PM<i>Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes.</i>
<a href="http://www.w3.org/TR/REC-xml/#sec-guessing">From the spec</a>:
<i>The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways...</i>
So, your choice for data integrity is a spec where determining what encoding it's in is, in the words of the authors, "not entirely hopeless."
ben on June 24, 2008 10:00 PMAt my place (a research lab), people barely throw at me rocks because I use S-Expressions in place of XML for my various numbers crunchers. Basicly, my S-Expressions are used to setup factories, which in turn build-up objects that are then tinkered by my application. A real life example :
----
embryo(
material(
name = 'steel'
density = 7860.0
max.strain = 0.1
young.modulus = 210000000000.0 # newton/meter square
)
control.model(
control.model.A(
nb.chemicals = 1
damping = 0.1
nb.cell.neurons = 1
nb.edge.neurons = 4
)
)
template(
beam.template(
load = 6000.0 # newton
radius = 0.001 # meters
width = 2.0 # meters
height = 1.0 # meters
nb.hrz.patches = 8 nb.vrt.patches = 3
)
)
)
----
Imagine the same in XML : it would be less readable. Writing a parser of this ? Hey, mine fits in barely 300 lines of C++, does syntax error checking with gentle exception do give error messages. It's a LL(1) grammar, so a finite state automaton and a stack and you're done... XML grammar is lot more demanding.
S-Expressions
* readable
* lightweight
XML
* not very readable out of tiny files with less than 3 levels of imbrication
* heavyweight in ressource
You say that the example you pick has a large difference but when you have 100s of that to send over the wire its gigantic!
I'm writing a mapping application and guess what? I have to load around 100s of point of interests to overlay on a map at once and god forbid, the data comes in XML.... imagine parsing all that using the browser's javascripts.
There's more to verboseness and parsing headaches, there's also the space and network bandwidth and CPU cycles tax and oh.... I can think of a lot more when everything is in XML.
totally agrees w/ you.
Just like Lucas said - why not try JSON instead? I also liked Douglas Crockford's assessment of it being pretty much XML without all the crap in it.
After many years spent dealing with XML and RSS in particular, I'm going for JSON in my future projects. It's either that, or I having to come up with an even better way.
Kari Pätilä on June 24, 2008 10:44 PMI've been doing some .NET and SharePoint development the last year, and it's the best way to start hating XML: Handling it is very clunky (e.g., having to define a namespace manager even if there's no namespace defined), it's used in the most ridiculous places (CAML is just verbose SQL), it's bastardized (ASP.NET should be put to sleep) and it's used even for name=value content like web.config. If that's your exposure to XML, no wonder you're thinking twice about using it.
Victor Engmark on June 24, 2008 11:46 PMquote. Every time I look at my web.config XML file, there's a mental cost of me having to parse all these tags in the file. end quote.
Thats the bottom line. Get used to it. If you're having a problem reading xml, learn how to do it better.
Dave P on June 24, 2008 11:59 PM"There is a very real mental cost to parsing even a few short lines of XML"
For you maybe, definitely not for me. MSBuild files, they really makes my head asplode...
"the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?"
So, what are you going to do with that time once you saved it? How is this metric useful?
Mike on June 25, 2008 12:14 AM@Fanboy?:
That author lists a bunch of stuff that XML has, eg XPath, XSL.
That's great, but if he had ever learnt to use Lisp/Scheme, he would know that those extras are already part of Lisp/Scheme (syntax, macros, etc).
IMO XML is simply sexpr's and the rest of the XML technologies are simply a ripoff of what exists at the core of sexpr-based languages.
I read somewhere: <> + marketing = ()
I think that should be: () + marketing bs = <>
Cheers
leppie
leppie on June 25, 2008 12:16 AMI recently had a case where a someone in my office needed to store a list of customer ID's to disk. There instant thought was to just serialize the collection in XML!
If we thing about it it makes what could be a simple CSV in to a giant file containing hundreds of <String>12324</String> (not to mention all the data at the top of the XML). But the case is most people don't and jump for the quickest tool.
I believe most technologies are there for a reason and each case should be taken to choose the correct technology. I think XML has it's place but it should not be the default choice.
John on June 25, 2008 12:23 AMWhat I am hearing from Jeff here is not: "Replace XML with this YNM which is always better". It's more like: "When you decide to use XML, make sure you know WHY you are using it, and please be aware that there are alternatives".
<br>
All the critisism about XML having feature X and standard Y and handles everything - that is valid and is a reason why XML sometimes is a good solution to a problem. It is also the reason why it is sometimes a BAD solution. Know the difference. Think, then decide.
I'm amazed at the fuss some people make over readability. If you think XML is unreadable, try using something other than Notepad to read it.
I know XML is not ideal, but at least it means you don't have to worry about (a) parsing, (b) encoding all possible characters, (c) representing strongly-typed values. Pick any other format and you have to implement some of those yourself.
Chris on June 25, 2008 12:33 AM> And don't even get me started on XSLT...
I personally can't stand editing/reading XML, but I dearly love XSLT. It's a brilliantly designed language. I once heard it described as "the wonderful language with the horrible syntax".
I have actually written a DSL embedded in Python so that I may write XSL transformations without having to write XML, and I love it. I actually prefer it over nearly any template language, now that the XML pain is removed. Well-formedness guarantees are a wonderful thing!
Kyle S on June 25, 2008 12:41 AM[?xml version="-1.0" encoding="UNICEF"?]
[procondocument name="my take on the good and the bad stuff with xml"]
.[list type="pro"]
..[arg]everybody can do it[/arg]
..[arg]global standard[/arg]
..[arg]it can be used for almost everything[/arg]
.[/list]
.[list type="con"]
..[arg]just because it can everything, it does not meen it should[/arg]
..[arg]DRY, with XML you repeat yourself over and over[/arg]
..[arg]terrible to look at[/arg]
.[/list]
[/procondocument]
If the intellectual overhead of using XML is so low as to be insignificant for ANY task, as some fanboys claim, then why are there still non-XML formats out there? Why do we not write Java or C++ or C# or Python code in XML format?
Simply put, Jeff is right, XML is not a panacea. Even if it is still your "go to" choice as a data format if it is your one and only choice in any and all circumstances then you must accept that you are limiting yourself. Think, use your judgement, weigh the pros and cons of using XML and of alternatives, then decide.
There's a marvelous book called "Conceptual Blockbusting" which talks about many aspects of creativity and problem solving and one of the most useful things I got out of that book was the term "satisficing". When problem solving there are two general categories for the methodologies used to arrive at a solution. On the one hand there are all of the methodologies which find a solution which is workable and then stop and move on to implementing that one solution. On the other hand there are all of the methodologies which continue to look for other, perhaps better, solutions even after one has been found which may be workable. The first strategy is called "satisficing", and many people do it without thinking. It's the reason why refactoring and redesigning and rebuilding things is often necessary. People working on a solution didn't stop to evaluate their design or consider alternate designs and instead just jumped ahead to implementation. This is the difference between buying a car by putting it on your credit card and buying a car by financing it using a low interest automotive loan. Assuming your credit card limit is high enough, both solutions might be "workable", but they are not of the same character and I think it's clear that one of these solutions is in almost all situations vastly preferable to the other. So the next time you've come up with a solution to a problem ask yourself whether you're cutting yourself short by satisficing, maybe you should spend some time (but not too much time) trying to come up with other solutions so that you can compare various solutions against each other and determine which one is the best, it may be your original solution, or it may not. A few minutes of forethought now can often save hours or eons of pain later.
The same applies to XML, this should be common sense.
Robin Goodfellow on June 25, 2008 01:18 AM@leppie the problem with s-expressions is the same problem with json. because you eval it directly, you better hope that it just contains data. And to my eye, looking at s-expressions is no easier, nor harder, than xml.
Nobody ever said Xml was anything revolutionary. It's not a silver bullet. But just because it brought together a bunch of disparate technologies hardly makes a counterpoint. (The Bugatti Veyron takes all the best know-how to make one kick-ass car, therefore it sucks- no wait that can't be right)
@RevMike - it's entirely possible that you shouldn't be querying multi-gigabyte files in pure xml. Nobody ever said Xml was a replacement for a database, or other file structures. Index nodes in the file or use the xml as the basis for a cached copy if it's a measured performance bottleneck, or convert it to a format that works for it's intended use.
-------
There's still no compelling technology to replace Xml. And no matter what happens, design by committee and bad coders will no doubt create monstrosities in any technology.
Problem Exists Between Keyboard And Chair. So until computers start to write their own programs, you're stuck with monkeys like the guy sitting next to you.
philx on June 25, 2008 01:27 AMPeter Palludan,
(defparameter *summary*
..'((pros
....."everybody can do it"
....."global standard"
....."it can be used for almost everything")
...(cons
....."too verbose"
....."hard to parse"
....."another language")))
;; SAX? DOM? Pah!
(defvar *good*)
(defvar *bad*)
(dolist (yay (rest (first *summary*)))
..(push yay *good*))
(dolist (nay (rest (second *summary*)))
..(push nay *bad*))
Here are three types of data for which XML is a Really Bad Idea(TM):
* Non-hierarchical data, since you'll have to deal with idrefs everywhere. Just use a DB.
* key=value data, but don't come around complaining when your neat little format turns crazy since you hacked it to contain 2-dimensional arrays and ID references.
* Enormous amounts of data - Use a DB or a custom binary format to optimize the handling.
I'd say the line-by-line method for your config files is fine. If you have a spec that includes strict restrictions on what can be in this file. Strangely the software using such 'simple' files seems to be exactly the software without such specifications.
Once you start having strings in there, you'll already have to guess the encoding and probably about how to escape line endings and = signs as well. Before you know it you're trapped in this and have to actually think about what to do when writing your config file. With XML you just use a readymade library and there your are.
For easy readablity you can still use a graphical XML viewer. But I'm pretty sure that some clever syntax highlighting will get you most of the way already.
ssp on June 25, 2008 01:45 AMIt is odd how people feel that the alternative to not using XML is to use an in-house format.
Why not have 3 or 4 standards, ranging from "simple but limited", to "comprehensive but complex". In fact, we already have these, in the form of everything from XML to a simple list of line separated entries. (Ever encountered the dreaded list of names, all surrounded by "name" tags, AND THAT'S ALL THAT'S IN THE FILE? XML was not needed there.)
Think of it like Newtonian and Einsteinien physics. Einsteinien physics provides a more accurate model, but Newtonian physics is used for most situations, because it's easier to do the maths, and on the scales it's used, there's no difference to the results.
I figure a lot of people who have declared geekhad on Jeff have encountered someone who developed an in-house format all their own, and are assuming that's what he suggested. Surely all he's saying is that you should consider other options before the knee-jerk XML route. There's probably a list of questions you could ask, a few examples being
1) How much data is being used?
2) How deep does that data go?
Anyone got any others?
[Third time's a charm :-)]
Thinking about it, most of the repetition (and therefore visual crud) comes from the closing tags repeating the name in the opening tag. I'm just thinking out loud here, but how hard would it be to come up with a shorter "default" closing tag? For example, an empty closing tag ("</>") could be used to mean "close the innermost open tag". The sample XML would look like this:
<doc>
<fruit>pear</> <!-- closes "fruit" -->
<vegetable>carrot</> <!-- closes "vegetable" -->
<topping>wax</> <!-- closes "topping" -->
</> <!-- closes "doc" -->
It's just syntatic sugar but it's considerably shorter than the original (if you remove my comments, of course).
Two Points
1. XML is unreadable - how many times do you end up reformating the web.config file to line the attributes up just so you have a chance of reading it.
[add name="zzzzzzz" value="vvvvvvv" /]
[add name="z"_____ value="v" /]
2. .net needs to support other text based serilizers out of the box - before we stand a chance of anything changing, I bet that list of names mentioned by a previous poster was a serilized array. (0 effort on the part of the coder to persist some data to a file.)
Robert on June 25, 2008 02:10 AMThe project I'm working on uses XML for all its configuration and output data. Some of these files are pleasant to work with whilst others are truly dreadful. The main difference between them is the quality of the design and the level of understanding of XML the designers had.
The well designed files are easy to read and easy to edit (especially in an editor that can parse the schema and do auto-completion).
The poorly designed files fail on many levels. For example, the system we're developing consists of a set of software components. The components to instantiate are defined in an XML file. Each component has a set of parameters for each instance. These parameters are stored as child elements of the component definition as a key/value pair list. Components reference their parameters by name, the name has to be cross referenced to the key using another XML file. The upshot of which is that hand editing the file is impossible, even with auto-completion. An example (convert to XML):
component
__id
__name
__type
__parameter
____parameter_id1
____value1
__parameter
____parameter_id2
____value2
__parameter
____parameter_id3
____value3
parameter_definition
__parameter
____parameter_name1
____parameter_id1
__parameter
____parameter_name2
____parameter_id2
__parameter
____parameter_name3
____parameter_id3
So, when the component is instantiated it attempts to get parameter 'parameter_name3' which has to be found in the parameter_definition table to get parameter_id3 (no guarantee it's there though) then use parameter_id3 in the component's local parameter table. So even though you have XML that conforms to the schema, data can be invalid or even missing.
With XML and well designed schemas, the source control check-in process can be set up to validate XML files against their schemas:
on check in xml
test against checked in schema defined in XML file
pass schema check - check in file
fail schema check - display error, don't check in file
As Jeff points out, it's all about using the right tool for the job. You wouldn't use XML to transmit data across a CAN bus for example, or as configuration data on limited performance embedded systems.
Although XML is very useful, it's not a panacea for all software problems.
Skizz
Skizz on June 25, 2008 02:11 AM> Once again, the issue is that XML isn't meant to be parsed by a human. It's intention is not to be human readable
well, actually, the officially given reason for repeating the opening tag name in the closing tag (there's no real reason to) is for parsing by a human. :) heh
anyway - I just read a relevant paragraph in the O'reilly "RESTful Web Services" which I thought I'd share (just the conclusion) -
"JSON is useful when you need to describe a data structure that doesn't fit easily into the document paradigm"
Ed on June 25, 2008 02:40 AMWindows only, but I recommend liquid XML studio freeware. Makes parsing those XSD's and making sure that you've created documents that match them much easier.
http://www.liquid-technologies.com/Product_XmlStudio.aspx
Haven't tried the pay for version, but the free one is OK. A tad buggy, in that sometimes the search gets lost and stuff, but for freeware really good.
Francis Fish on June 25, 2008 03:17 AMJeff, I disagree often and a lot with things you write in your blog; but hey, it's everyone's right to have his/her own opinion, isn't it? Still your opinion matters to me; why would I otherwise even read your blog? I read your blog, because sometimes you come up with really interesting ideas and aspects most people never even thought of. Many programmers take certain ideas as facts; "that's just the way it is". They don't even dare to question them. You question many of these and even if you may not be able to come up with a solution to all problems, your blog at least makes people aware of possible issues, kind of "See this, now see that... see the problem?" and this often causes an "Ohhh" or "Ahah" effect. People start to reconsider their facts and recognize that these are not set in stone.
Back to topic: XML is not a fact. XML is not God-given. XML is an idea. An idea that got popular. XML might be a good solution for some or even many problems, however it may be a poor solution for other problems and even if it works as a solution for some problems, there might still be better solutions than that. You seem to dislike XML and guess what, this is one of the topics I seem to agree with you.
The main problem I have with XML is: For whom is this language actually designed?
A) For human beings, so you have human readable data? Really? Well, as you pointed out before, XML is very hard to read. Easy samples like shown above are still human parsable, but I can give you a 2 MB XML file that will make you cry. XML is not for human beings, it's too verbose and too complicated once the data file grows beyond certain limits.
B) For computers, so you have a standard way to store arbitrarily data? Certainly not. XML is far from being easily parsable for machines. I can think of 100times easier to parse data formats if it only needs to be machine parsable.
So if XML is neither for A nor for B, what is it good for anyway? I guess it is the try to create a format that is at least somewhat human readable and at least somewhat easy to parse for machines. Bad choice!
Instead I had created two equivalent formats - so that you can always convert between them in a 100% lossless way. One that is very easy for machines to be readable and one that is very easy for human beings to be readable. Sounds like a much better approach to me.
Actually Apple had such an approach. Apple has the old NextStep PLIST format, which is very easy for human eyes to read. And they have a very compact binary format. Both were replaced by XML instead. The NextStep Format is legacy and the binary is not legacy, but it's not the default format being used either.
Here's an example for the new XML PLISTs:
http://tinyurl.com/54dv52
Compare this to the old, human readable ones (much more readable):
http://tinyurl.com/5yefeb
There is no description for binary PLISTs; but be assured, these are optimized for being machine readable.
Mecki on June 25, 2008 03:19 AMXML is bad for many reasons already outlined, but YAML is not a good solution imho. JSON is better. YAML obscures the structure with white spaces. It's dangerous, if you let someone edit a YAML file who is not experienced with YAML, they can completely mess up the structure of the data.
On the subject of XML and abuses thereof, how do you feel about XML comments in C#, Jeff? My feeling is that they are hideous and bloated, but that I can't get away from them because there's no other way to get the benefit of comments appearing in Intellisense tooltips and stuff like that. I only wish that they'd chosen a more concise format that's human-readable. Either that or a more advanced source editor that renders and edits the XML comments differently from the rest of the code.
Weeble on June 25, 2008 04:29 AMIntriguing comments, but surely as human readable as XML is supposed to be, no one in their right mind is going to try and parse catalogues of stuff in xml?
Use the right editor, something that reduces XML to it's DOM, expand it as you wish.
You can produced structured better tools for dealing with XML, and that's how you should use it.
Sure, parse it in your mind yourself, but bear in mind, human readability is an advantage of XML, not its purpose.
goatslayer on June 25, 2008 04:41 AMXML is great for heirarchical data and as a non-binary storage format for information that you may wish to use in lots of different ways - XSLT is weird to use but very powerful.
It's a nightmare for storing name/value type information and also actually for things like the configuration files I'm working with at the moment that, for reasons best known to Beelzebub, stores regular expressions as xml tag attribute values. As you can imagine this creates an escaping nightmare as you have to do your regular string escaping, your regular expression escaping and then your XML escaping before it can be added to the document, giving you many layers of possible (and indeed probable) fail when you are editing it by hand. I'm working on a simple editor that gets around the whole problem ( I only just started here ) but I'm amazed that people have done this for years and never apparently questioned it.
Breakfast on June 25, 2008 04:45 AMJeff,
I am an administrator (doing Linux right now, but I have also been a Windows admin in th erecent past). I have been reading your blog for about 2.5 years now.
I am amazed at how many people seem to miss the point of your post -- to reiterate (again) THINK!.
Please, do not make me parse XML if a simple key=value list will work. People keep commenting "just use a parser" or "syntax highlighting will make it readable". Drop the arragonce people, the data is not just for you to read. Us poor admins need to parse many of these files also.
btw - Keep up the excellent work (I may not always agree, but I always find it interesting and thought provoking).
Douglas on June 25, 2008 04:52 AMI use one of three methods of storing data. CSV, XML, SQL Server.
I consider this a fairly extensive toolkit for doing almost everything I need to do from the simplest to the most complex.
I do look at other alternatives, and seriously consider using them. However, I try to limit the tools that I use in order to be more efficient with them. The expression "Jack of all trades, master of none" springs to mind were I to use many more methods.
Therefore, until I come accross something that I truly consider worth adding to my toolkit or replacing an existing item then it will remain as it is.
As for XML itself, if the data I need to store is fairly simple then I may well hand crank it. I personally find xml extremely clear and easy to read if I've written it myself.
Alternatively though, I will quite happily produce fairly complex XML where a system didn't quite justify or require the extent or performance of full SQL database. These files will never be hand cranks but will be written and read entirely in code. Seeing as I mainly write in C# then it couldn't be simpler!
Whilst the XML format may cloud the actual data, or as you say, "only half of it actually matters". The main thing is, whilst human-readable might be wrong, coder-readable is certainly true. The main point with using XML as my data source means that should push come to shove and I really need to get at or change some small item of data, I can spend 15 mins with notepad and do just that! This is not something I do on a regular basis, its just nice to know that it's something I can do if I really need to.
Robin Day on June 25, 2008 05:34 AMIMHO good example of XML abuse is requiring XML (and XML parser) if you want to push to server via HTTP(S), i.e. for WebDAV. WTF XML does here? HTTP uses nice configuration mini-language (headers), why not use it?
Jakub Narębski on June 25, 2008 05:34 AMMany of the comments here seem to be like "There's nothing wrong with XML. Learn to read it"
They are missing the point, XML is designed to be read by computers not by humans, if you are forced to read it or write it then your program is broken, if you are often having to write it your interface is broken
If you want to use it to modify settings/parameters etc ... then use a human readable format... or use an interface, you should never force people to read/edit raw XML
Jaster on June 25, 2008 05:35 AMYeah, XML has a standard so we should just stick with it.
"That's how we've always done it."
MattH on June 25, 2008 05:44 AMI am currently developing an application that has a project file. The requirement of the project file are:
* User may want to edit project files directly, i.e. it should be editable by a general purpose editor.
* Edited project files should be consistent, no unmeaningful content is welcome.
* Project files content should be easily mapped into objects.
* In case of a version (content) change of the project file, previous files should be migrated.
And the choice for the project files' format is XML. For first three tasks I only write an XSD schema and use JAXB to generate classes that map to xml files. For the migration task I use either XSLT or DOM.
I don't think any other technology (such as YAML) or in-house built code would be as mature as XML for my requirements.
Bahri Gencsoy on June 25, 2008 05:51 AMThinking is key.... You should always be thinking about using the right tool for the solution. XML is pretty cool stuff, but it *does* bear a tax. For you PHP folks out there, think of what the php.ini file would look like and bloat to if it were XML. Think of how much additional annoyance and frustration you would have while walking through the file to adjust your settings. Even the simplest of tasks, commenting a line, for example, suddenly becomes an ordeal of angles and dashes.
Dan on June 25, 2008 05:51 AM@Weeble:
Yep, the XML comments in C# are a pain to use. I think they were a very poor choice. They feel incredibly bloated, particularly if you want to add anything useful like referencing other methods/classes etc.
For those that are unfamiliar, consider this simple comment block:
/// <summary>
/// Closes a Thing instance, previously opened by <see cref="OpenThing">
/// </summary>
/// <remarks>
/// Will fail if <paramref name="toClose"> is already closed.
/// </remarks>
/// <param name="toClose">The instance of Thing to be closed.</param>
/// <returns>True if <paramref name="toClose"> was successfully closed, False otherwise.</returns>
public bool CloseThing(Thing toClose)...
Compared to a slightly saner (but still parsable) human-readable syntax:
/// Summary:
/// Closes a Thing instance, previously opened by [OpenThing]
/// Remarks:
/// Will fail if {toClose} is already closed.
/// Params:
/// toClose - The instance of Thing to be closed.
/// Returns:
/// True if {toClose} was successfully closed, False otherwise.
public bool CloseThing(Thing toClose)...
In the XML example, there were 161 characters for markup and 194 characters of "real" content - so around 45% of the comment was 'noise'.
Graham Stewart on June 25, 2008 05:57 AMI get the feeling this article, however many times it's repeated, is doomed to wage a holy war within 1 comment. However, some comments of my own:
"Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes."
XML does absolutely nothing to ensure integrity of data. It ensures integrity of *syntax*. In some cases, the greater simplicity of - say - "name=value" is worth it, especially given that the syntax check is trivial.
"... if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it's based on a standard?"
CSV is a 'standard', in that it's a well-recognised format with defined rules, shared by countless systems around the world. HTML and CSS are 'standards', yet almost every web browser out there implements them slightly differently. Saying something is a 'standard' is really saying very little; the important thing is that the data format is open, rather than proprietary.
"Once again, the issue is that XML isn't meant to be parsed by a human."
Really?? In that case, why not optimise the hell out of all of your DTDs - no more "Student" element when you can make do with "S". Providing a GUI layer in between the human and the data is all well and good, except when that layer goes wrong, or you don't have access to it. I guess, by your argument, XHTML isn't meant to be parsed by a human, and we should all be authoring our web sites in Frontpage. The benefits of readable textual data are *massive* - try reading 'The Art of Unix Programming' if you're not sure why.
Should I ever get to work on any of your code, Jeff, you better remember this: http://www.codinghorror.com/blog/archives/001137.html
XML is wicked easy to pickup, and soon as you have: you're reading, writing, editing all sorts of files written in xml. It doesn't mean that you SHOULD but you CAN. So drop the nonsense about learning a million new formats and standards just because "you should learn the minimally required": well, if I know xml, isn't THAT the minimally required? Any OTHER language/format/standard just requires more.
Apart from that, it's a truism that you should use the right tool for the given job. XML is bad for some things and great for others. Leave it at that.
Regards
Fake
I'm just going to go back to reading INI files
Private Declare Function GetPrivateProfileString Lib "KERNEL32" Alias "GetPrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpDefault As String, ByVal lpReturnedString As String, ByVal nSize As Long, ByVal lpFileName As String) As Long
Private Declare Function WritePrivateProfileString Lib "KERNEL32" Alias "WritePrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpString As Any, ByVal lpFileName As String) As Long
Public Sub WriteINI(wiSection As String, wiKey As String, wiValue As String, wiFile As String)
WritePrivateProfileString wiSection, wiKey, wiValue, App.Path & "\" & wiFile
End Sub
Public Function ReadINI(riSection As String, riKey As String, riFile As String, riDefault As String)
Dim sRiBuffer As String
Dim sRiValue As String
Dim sRiLong As String
Dim INIFile As String
INIFile = App.Path & "\" & riFile
If Dir(INIFile) <> "" Then
sRiBuffer = String(255, vbNull)
sRiLong = GetPrivateProfileString(riSection, riKey, Chr(1), sRiBuffer, 255, INIFile)
If Left$(sRiBuffer, 1) <> Chr(1) Then
sRiValue = Left$(sRiBuffer, sRiLong)
If sRiValue <> "" Then
ReadINI = sRiValue
Else
ReadINI = riDefault
End If
Else
ReadINI = riDefault
End If
Else
ReadINI = riDefault
End If
End Function
How do I call an API in .NET?
:)
@Fake51: "XML is wicked easy to pickup"
Actually I think that is a pretty common misconception.
Most XML-abuse situations I've seen have come from developers who found XML "easy to pickup" and converted their existing file format by randomly adding a few angle brackets to produce structureless blob that impossible to validate.
In reality there are quite a few aspects to XML to master before you have really picked it up (XML Schema (XSD) or DTD, XSLT, validation, XPath, XQuery, XPointer, UTF-encoding) not to mention the more philosophical issues involved in designing a good schema.
I'd say "Bad XML practises are wicked easy to pick up. Good XML takes time and practise."
Graham Stewart on June 25, 2008 06:26 AM"not XML the religion"
That's the point, exactly!
Seems some people need religion but fail to empty
their religious cache in church ("What church?").
So religion pops up at places where it doesn't belong.
XML, LDAP, Agile Development, ...
I'm really p*ssed when things like "Agile Development" suddenly
get this religious monumentum and people start to use
new toys because the believe in them instead of relying
on scientific data.
Nice reads:
Terry Pratchett's "Small Gods" was an eye-opener for me
when it comes to church and religion.
[ The "Science of Discworld" books are nice too ].
I also recommend Alfie Kohn's "Punished by Rewards".
It will crash everything you believe in about the school
system, performance payments, stick and carrot by providing
scientific data.
I have a strong mathematical background so I will never understand
why people replace a properly working (mathematical beautiful) relational database system with "crap" like XML or LDAP.
The objects in a database are tables. When you operate on a database
you get back a table, so you can use this output as input again.
Try that with XML or LDAP.
This isn't religion, this is fundamental mathematics (->Algebra)!
In praise of sloth - my never-ending rant but no one reads this far down the comments so it doesn't matter:
Other than the fascination factor, the reason I began coding was that it saved me time in math classes. I could solve the same equation over and over or I could write a program once and never have to solve the equation again. Seemed like a no-brainer to my lazier proclivities.
A primary reason I continue to be interested in most technologies has largely to do with convenience. To that end, I typically opt for the one with the most support and best features. I used IE until FireFox had a really huge base, even though FireFox had almost always been better. All mp3 players in my house is are iPods because accessories and support are abundant. And I use XML over all other alternatives because there are libraries everywhere for it. My programming languages of choice have never included the new sleek ones that are so hot at the time and whose whole support base consists of a few thousand fickle fanboys, little to no real documentation, and no professional experience.
Thank you for raising awareness and getting us to think about WHY we use what we do. In my case, the above is why I still chose XML. If I woke up tomorrow and I was tripping over YAML blogs, how-to articles, support, plug-ins, libraries, and billions of man-hours of experience -- I'd switch.
Dinah on June 25, 2008 06:29 AMIt's an ongoing battle and a good topic for conversation. My only qualm is with the reader/commenters who are writing this off as a trivial matter or 'It's here to stay. Deal with it.' You are the reason we're having the problems. XML isn't here to stay any more than Fotran was here to stay (read: stay=in the mainstream).
All technology is subject to change and as programmers, designers, and salesmen of the technology we love...you have to ask yourself every once in a while if it is the right thing. To reject reflection on the toolsets and the future is the killer of innovation and discovery, not to mention bordering on the "stay the course" attitude that we've all come to love in our political representatives.
There are serious problems with XML, but it's among most impressively empowering tools available to the modern programmer. Readable? Efficient? Effective? Enabling? Perhaps! Realize two things: 1) XML is approaching ubiquity in mainstream applications 2)We disagree on some properties about XML. These two statements result in a non-trivial argument.
Raymond on June 25, 2008 06:35 AMI've only recently started reading your missives, but on the subject of XML, I completely agree with you.
I fail to see how (say) a simple text configuration file is improved by wrapping up all the baggage of XML tags, who's only purpose seems to be to make life harder when it comes to parsing the file.
Yes, there are "standard" XML parses for pretty much all mainstream languages these days, but that doesn't make it right. Why use a big parsing library for something that could be done in a few lines of [insert your language here] if only the text file was simpler? It doesn't make sense.
Yeah, lets stick with punch cards they are standard and everyone can read them.
As the main point of the article is THINK don't do stupid stuff. It is rather funny that most of the comments are don't make me think let me have my crummy old XML.
stonemetal on June 25, 2008 06:36 AMxml, eh? don't get me started.
if only the old win 3.1 developers would have just kept their .ini files in the right places... :) now we have windows registries and xml files everywhere...
thats the thing with stuff like xml, once it gets used to solve a problem which stemmed from bad design/implementation to begin with... it is automatically being used for the wrong reasons.
my opinion anyway...
personally i'm all for raw binary data in a specified format... its most efficient to store and read, and unless the developer makes a mistake you won't have any problems. in practice though using an xml library is faster, even if it produces an inferior quality result for the end user...
Jheriko on June 25, 2008 06:40 AMI once worked on a project which was a questionnaire program. The answers were stored in a comma-separated text file, which was then imported into MS Word and used in a mail merge to create a 200-page document. Over time, new questions were added. The answers were stored anywhere that had room in the CSV file to prevent breaking the mail merge macro. This caused a lot of problems in the long run because the questions and answers were out of order. It was very difficult to diagnose errors when they occurred.
One day I suggested we change the data format to XML. This would allow us to reorganize the questions and answers without breaking anything because the names of the nodes would stay the same forever. It would also make the data file more human-readable to diagnose entry problems. Though I don't know for certain because I haven't tried, I would suspect that parsing fruit=orange would be more difficult than <fruit>orange</fruit>. Particularly if the word fruit was given as a value somewhere else in the file. (eg. food=fruit)
Scott on June 25, 2008 06:42 AMHmmm,
this is also XML and might be easier on the eyes :
<doc>
<fruit is='pear' />
<vegetable is='carrot' />
<topping is='wax' />
</doc>
This tends to drive XML believers nuts for some reason ;)
T.
I think we all have to be honest with ourselves, and realize that most of our time is spent fixing things when they *don't* work.
The data format (XML) doesn't matter when things work. It's when your trasmission fails, and you have to go trudging through the raw data to find the error that it really matters.
I find them all difficult to read in different circumstances. JSON and YAML when represented without line spaces (say, in your ajax debugger) are nearly impossible to read, and XML does ok. Custom formats actually do better! (Say with, something crazy, meant for joining text together—like pipes?)
However, a raw dump of a lot of data in YAML is easy to read, not so much in XML.
JSON excels when you have to parse it. It's already in your array! Just use it. Beautiful. It works like that on both ends.
Anyway, I think what we have here, is XML is the right concept, wrong implemention. Theoretically speaking, XML is awesome. It's standardized, easy to learn. Practically speaking, it's a beast. Those angle brackets are terrible. Terrible things! Use something else!
I don't think Yaml is the solution, but I think it's a step in the right direction.
I think to come up with a solution, we need to interface with someone who specialized in how to make text easy to read, and that would be somebody who is NOT a programmer. Come on great universities of the world, THINK!
Jeff Davis on June 25, 2008 06:46 AMThis just in, Jeff STILL hates XML, apple pie, America, and your mom.
Oh wait...that's not what he said. Disregard. Or maybe its the smackdown learning model ;)
It all comes down to convention for me. Once the first person writes some configurable project data in xml, that's it, you're locked in. Or you can maintain 15 types of text files on a single project. I think it's the same for most team choices, the project was started in java, so you can either continue writing in java to keep the maintenance down, or you can buck the trend and write in VB. And you're all-star enough to make that happen, because its what you carefully analyzed, considered, and decided was best for your situation. Then poor Joe that has to add one friggin field to a report is cursing you to high heavens because he has to load java in his brain for the server side and vb for the client. Similarly, I don't want to look at 5 kinds of markup for different aspects of the app. Don't put it all in one monster file by any means, but please don't roll out a different superior solution because of a braindead choice we made early on. Especially for text, it's not worth it.
SteveJ on June 25, 2008 06:47 AMJeff, you really should change the (no HTML) remark by the comment window to read (no HTML, but please remember to encode < as &lt;, > as &gt; and & as &amp; or the blog will eat most of your post)
Graham Stewart on June 25, 2008 06:51 AMAgain with the YAML and JSON. Yes, fine, these are technically standards, but they're incomplete ones. You can't validate them. They don't have well-formedness. They don't have schema. They don't have metadata. Maybe you don't need any of these things, but in cases where you do, YAML and JSON are not "alternatives" at all.
To the guy talking about XML files spanning hundreds of MB or even GB, where have you seen this? I don't think I ever have. And in those cases, just how much bigger is it than, say, JSON, and is there any difference when you compress it, which you should be doing with gigantic data streams anyway?
If this is being used as an actual storage mechanism then of course you should be using a relational database, but I don't think there are many people arguing for XML files as a replacement to SQL Server.
Yes, there are alternatives to XML. There are also alternatives to .NET. And there are alternatives to Microsoft Windows. I choose not to use those alternatives because the de facto standard is a lot more convenient for me. Minimalists, this is the 21st century; if angle brackets are too much for you than maybe you should also be dropping Unicode and 4-digit years.
Aaron G on June 25, 2008 06:55 AM@Konijn: yep, see what I said above about bad XML being easy.
In my experience evil XML like that is all too common.
@Graham Stewart
What? You mean you don't read the entire page as raw source code?
I thought XML was easy to read!
Just having fun. ;)
Jeff Davis on June 25, 2008 06:58 AM> There is a very real mental cost to parsing even a few short lines of XML.
I would suggest that the mental cost of parsing proprietary data files with no markup at all is much worse.
How many obscure configuration files did you scratch your head at in the pre-XML world? It usually went something like this for me:
"Hmm, how is this data laid out? What the heck does a colon mean as opposed to two periods?? Ok, I think I've got it. Now, applying this potential layout I think I understand, where might the data I'm actually looking for be? No, I think I misunderstood the two periods afterall. Wait, there's the data!"
XML gives readers a hint about the format, because at least you know the UNIVERSE of organization you're working in.
I'm not arguing that XML solves all problems, but I don't buy into the "XML is hard to read" camp. If anything, it makes it easier simply because more folks are familiar with it and it's documented.
Vance Vagell on June 25, 2008 06:59 AMI'd personally like to see you provide a comparison of a web.config in XML and YAML side by side.
I agree that there is too much pain in dealing with that file, but I completely disagree that this is a result of it being in XML. The fact that half of the file is boilerplate that I never need to touch does it all on its own, and almost all of it is poorly structured does it all on its own.
Jess Sightler on June 25, 2008 07:30 AM@Andrew:
They AREN'T using ASP.NET WebForms, they ARE using ASP.NET MVC. So it certainly does look like alternatives were considered.
@Mike:
You're actually going to use the summer to learn more about what XML can do for you? Might I suggest the other 59 days be spent up learning o/r mapping, or python or lua
Funny, if XML's style and visual parsing 'expense' is a matter of taste, then I wonder why programmers seem to be quite happy writing code like:
int j = 1;
rather than
<variable name="j">1</variable>
or
<variable name="j" value="1"></variable>
or
<variable>j<value>1</value></variable>
Tools and technologies should be used when they provide a net benefit, and not solely because they are a 'standard' or 'fashionable' (unless of course, standards and fashion are the sole criteria on which you measure gain.)
XML is useful. So are other formats. Use what makes sense for the particular need...
Michael Curry on June 25, 2008 07:36 AM>Thats the bottom line. Get used to it. If you're having a problem
>reading xml, learn how to do it better.
I agree totally. People were created to make things easier for computers, not the other way around.
Heck, lets bag XML and move back to using binaries! Those can be read by a human just fine with a hex editor, a calculator, and enough time. It may be a PITA at first, but you'll get better at it. Quit being a bunch of lazy whiners.
"I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it."
Umm... go back and re-read like almost all of your blog posts that relate to .NET, Microsoft, or Windows Vista and see if you still feel the same way.
Sorry, but you are incredibly emotionally involved with the tools you use.
Cecil on June 25, 2008 07:36 AM"As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project."
If you hate bracket syntax enough to post twice about it, and you want to encourage use of other tools, then maybe it's time to try something else?
Mattkins on June 25, 2008 07:39 AM> That brings up the question: What exactly is a standard? I always
...
> With YAML, it's... the yaml-core mailing list. The copyright for
> the specification is held by three individuals.
That's roughly how all the RFC's are done, and the internet is pretty much built on those standards.
T.E.D. on June 25, 2008 07:41 AM> How many obscure configuration files did you scratch your head at in the pre-XML world?
If the config files in /etc were all in XML, I'd go on a killing spree. Thank god most of 'em aren't.
Talisker on June 25, 2008 07:53 AMI am not sure if this discussion is really necessary.
You don't like XML for the reason that you find it hard to read. So fine, try to avoid it where possible. But to argue that it is hard to read in _general_ and for everyone is probably not the wisest thing to do.
I for one find it really nice to have a closing tag repeat the tag name because it makes finding matching pairs easier. And there is probably a reason why a lot of programmers write comments repeating the function (or whatever else) name at the end of a block.
And as a lot of people already said: XML covers a lot of special cases and tries to give a solid solution to almost any problems someone could encounter while creating some kind of document. It provides means to do very simple and very complex things with it, which, in my opinion, makes it really suitable for many use cases.
Of course there are cases (like the popular key-value lists), where a simple format is better. But on the other hand I have seen enough configuration files, where some weird syntax was introduced, because they need "just a bit more than that". If you can keep it plain and simple, do it. If you don't know, better use something that is extensible in a sane fashion.
@Someone who talked about hating XML but loving XSLT: Do you realize that XSLT is an XML application? XML isn't something in itself, it only provides a standard for creating specific markup languages that fit a problem domain.
I think most of the people who say "I hate XML" actually mean "I hate a certain kind of XML".
In the end it is actually a matter of taste, just like programming languages are. Or who does _really_ always use "the best" tool for the job? (Which can also mean the tool you are more used to)
XML is just another example of creeping verbosity being palmed off as "better."
In some cases, XML is demonstrably better. If you have to represent a tree structure with text, I really do prefer XML.
.Net, javascript, et. al, all have the same problem. Consider the number of lines and characters necessary to write the code to read in a file. I've lost track of the number of languages in which I've had to rewrite that darned "ReadFileIntoString(strFilePathName)" function. Why do this? So I don't have to remember the human factors nightmares that most programming languages impose for consistency.
*More*, whether it's XML or dot notation isn't always better. Simple things should be simple. Complex things should be possible.
ThatGuyInTheBack on June 25, 2008 07:59 AMWhoops! Jeff already talked about XML comments a few years ago:
http://www.codinghorror.com/blog/archives/000130.html
Personally, I don't think he was nearly harsh enough. I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain. And then there's the fact that you can't write things like "0 <= x < limit", but instead have to write "0 &lt;= x &lt; limit", or the strange-looking "limit > x >= 0".
Weeble on June 25, 2008 08:02 AMAnd what would be *appropriate* "romantic overtures toward their significant other?"
Tim C on June 25, 2008 08:03 AMThe free Liquid XML Studio is great for working with XML. It will even generate some C# sample code for you. On the other hand, I recently wrote some code to parse Google's gdata XML in PHP and that was ridiculously painful.
Robert S. Robbins on June 25, 2008 08:08 AMYes, I totally agree with you and I am facing this problem since the EPA decided that all data submissions would be in XML form. Even point-to-point inside business ones.
The operative words here are:
NO ADDED VALUE
Couldn't agree more, what is so hard about picking the right tool for the job in IT? Sometimes XML is the best choice, sometimes it isn't. You wouldn't (probably) use a screw driver to hammer in a nail, why do that when you code? 'Religion' in IT is ridiculous...
David Hayes on June 25, 2008 08:14 AMDo programmers really like any markup languages? I don't. The tags annoy me. The syntax looks ugly and crowded to me.
Ivan on June 25, 2008 08:17 AMI really don't get why you are focusing on the mental cost of processing XML. It's not meant to be human-readable. You are assuming that this is the PRIMARY purpose for the structure of XML. It is not. Interoperability is the primary reason for the structure, human readability is further down the line.
Nicholas Paldino [.NET/C# MVP] on June 25, 2008 08:22 AM"I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain."
If you need comments in a file format that is mainly designed to be read and written by machines, you're doing something wrong. You're not one of those stupid people who decided it would be a good idea to replace good old .ini-files with some XML-counterparts?
BTW, has anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a *real* stupid idea.
Vinzent Hoefler on June 25, 2008 08:27 AMmaybe if you kick their dog they will forget about the whole xml post?
Darren Kopp on June 25, 2008 08:34 AMVerbose or not, I'll take XML over comma delimitted files and fixed length structure anyday.
There may be instances where you can get by with either, but I hate having to account for encoding of special characters in my strings to serialize.
I don't find XML that un-readable... certainly more readable than fixed length or comma delimited if you ask me. But then again, why do I even want to read XML? There should be some app consuming it and all I should care about is that that app can read it and let the app present it to me in a readable format.
Kris on June 25, 2008 08:38 AMThese last two posts have probably been the best I've ever read on your blog yet. Keep it up!
John on June 25, 2008 08:40 AM@Aaron G
Some conterpoints.
- Well formedness. All this means is that the XML document conforms to XML's syntax rules. If it doesn't the XML parser will fail, just like a incorrectly formed JSON or XML document
- Schema and Metadata, you don't get this unless you use and DTD which is built in to XML or XSD which is a whole different standard. Right now I am working on a project that is using XSD and I have to say, I hate it. THAT is hard to read, and since we are still working on the document it just gets in the way.
- There are tools out that allow you to create schemas for YAML and JSON. Kwalify being on of them. (http://www.kuwata-lab.com/kwalify/)
Personally I think that defining schemas and metadata would be better handled in some sort of Domain Specific Language that is not tied to any one technology. Human and machine readable, but focused soley on defining the entities of a system and their relationships to one another It could be used to generate UML, XSD, SQL DDL etc. In the design stage of a project that would help greatly.
Finally, I have seen lots of places where GB of XML data are used for data exchange. Yes, JSON and YAML would be as large, but not as large and for large datasets that can make a big difference. And I would drop Unicode if it is not required. As to 4 digit years, I wouldn't bother, unless you have a data file of nothing but years they don't add nearly as musch overhead as XML.
Andrew on June 25, 2008 08:40 AMIn the end I think that XML has it's place. It is great for creating structured documents. XHTML is a greate application of XML. I am sure there are other that run along similar lines.
But XML falls down when used for raw data exchange and config files. There are many other formats that can handle that better, and most of them have as many tools to work with as XML.
It feels like the people that want to use XML for everything are similar to the people that want to use one programming language for everything.
Andrew on June 25, 2008 08:46 AMOne comment I'll make on the subject (anti-xml) is depending on the requirements and scope of your project, you may not be able to use those nifty XML parser and writer libraries that are available to you. As a Game Development Student, we created a toolchain which created content for a game engine in XML, this was nice and easy in C# however, in the actual game engine (C++), we were forbidden from using any third party API or Library, so we had to manually parse these XML files, which completely negated the key advantages of the XML format. Namely, if we had a fixed-format that we had to parse manually line by line, character by character, plain text would have been simpler and would have resulted in much smaller files.
Brandon K on June 25, 2008 08:55 AMDon't apologize.
Doug on June 25, 2008 08:58 AMStandards ARE tax. It's the price you pay for interoperability. That doesn't mean you have to use it. You can decide if you want to pay the tax or not.
With that said, I wasn't very fond of XML in the beginning, and I'm still not. If people want to use it, fine, but it's really nothing special. The most "special" thing about it is that magically everyone has agreed on something. It's more about timing than anything.
Angelo on June 25, 2008 08:59 AM@Vinzent Hoefler: "If you need comments in a file format that is mainly designed to be read and written by machines, you're doing something wrong."
Weeble was talking about using XML comments in C# source code, which is the "correct" thing to do, but is a pain to use.
Take a look at http://msdn.microsoft.com/en-us/library/b2s063f7.aspx
> "anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a *real* stupid idea"
Actually I quite like Ant.
Makefiles are full of magical archaic syntax (e.g. the automatic variables like $%, $^ or $(?D)) which can be a real pain to mentally parse if editing a makefile is something that you very rarely do.
well, here's a smackdown for you: how is this a revisiting? This is a reiteration of your original point with very little evidence. You might as well have titled this "shitstorm part ii". What's next? "Another 50 reasons PHP sucks" article? Seriously, the discussions I have about coding horror these days with other programmers are now along the lines of "Hey, coding horror seriously started going bad once Jeff went pro blogger. Jumped the shark, completely".
I think we all realize that your employment is now tied to your blog and you now need to regularly make posts that get traffic, but you're alienating your readers with this constant linkbait. Don't tell us what "sucks" -- I think we all have opinions about what sucks or what is cool. Just tell us what is cool. You like YAML? Great, write an article about YAML. You don't have to simultaneously make the point "Oh, XML sucks". If you truly believe in using the right tool, XML *DOESN'T* suck, it's just sometimes the wrong tool -- and YAML isn't good because XML sucks, it's good because sometimes it's the right tool. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations. Without the linkbait language.
anonymous coward on June 25, 2008 09:01 AMI'll agree that XML can be a bit "wordy" at times, but the given example is simplistic. For a simple standalone list, XML probably wouldn't be my first choice either.
But consider a more complex example, perhaps the inventory of a car dealership.
<inventory>
<car>
<manufacturer>Chevrolet</manufacturer>
<model>
<name>Cavalier</name>
<year>2006</year>
</model>
<color>Blue</color>
<powertrain>
<engine>
<cylinders>4</cylinders>
<horsepower>400</horsepower>
</engine>
<transmission>automatic</transmission>
<four_wheel_drive>no</four_wheel_drive>
</powertrain>
</car>
</inventory>
Now that can absolutely be made less verbose by using XML attributes, but XML is very helpful for representing the *structure* in a human-readable form.
I wouldn't recommend storing an actual car inventory in XML (that's what databases are for), but it's very definitely useful for structured data, such as a configuration file, that a human might need to read. (As opposed to some of the old binary files stored via the MFC serialization mechanism.)
Is it the solution for everything? Absolutely not! Using XML (or any other new technology) everywhere is phase one of adopting a new technology. XML isn't intrinsically bad, but just like any other technology, it can be misused.
The three phases of technology adoption are:
1. Refactor the entire system to heavily overuse the new technology, especially in manners where it was never intended to be used and/or is completely ill-suited.
2. Refactor the system again in attempt to remedy the problems caused by the previous refactoring.
3. Refactor the system with the next hot technology.
http://thatblairguy.wordpress.com/2008/03/10/technology-phases/
@anonymous coward
"You like YAML? Great, write an article about YAML. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations."
That was a good point atrociously expressed. I would like to learn more about YAML. How about it, Jeff?
I hate XML. It just ain't pretty.
Preeti Edul on June 25, 2008 09:08 AMThis just seems silly. If it bothers you, do something about it.
The simplest thing I could imagine in 10 seconds would be to write your files in YAML and convert it to XML. A quick search of the nets returns this utility:
http://search.cpan.org/~ingy/YAML-0.35/bin/xyx.PL
which may or may not work, but if it doesn't just write one--it'll take all of 2 hours. As the first step of compiling, place this in your makefiles. Problem solved.
If this is too much, find or write an editor plugin that presents XML as YAML, you'll never see another >
YAML was made to be hand typed--XML was mostly made to be machine to machine--you're right to feel uncomfortable editing XML directly--why would anyone do that?
Hey, as long as we're here--can I suggest a topic for thought? Those of us who are programmers have the ability to make our computers do anything (at very little cost). What is with programmers who don't feel it's proper to make their own tools?
Sometimes you have to make a little parser to change 30 paragraphs of repeated code to 30 lines of data. JUST DO IT! Move everything you possibly can from your code into data, then write something to input the data into your program.
And by data, I don't necessarily mean XML--data could be something as simple as a large string defined at the top of the file that you write a parser. In Java, array initialization happens to have a nice, short syntax--use that to get the data into your program. You can even put method names in your data and use reflection to link them at runtime--giving Java elegance similar to that of any dynamic language if you like.
Any code (no matter how large) you have to write just once beats the hell out of any code you have to write/modify any time you have additional data.
And if you have to use XML as part of your process--write a damn tool that takes the simplest input possible and outputs XML.. I doubt the initial designers of XML ever thought it would be edited by hand except in emergency situations. Before XML, it was just about always binary streams or binary files. Is that what you'd prefer to be using?
I use XML, and don't consider it too hard to parse. If you use a decent editor, with proper formatting and it can be quite readable.
I never used to use XML, just doing name-value pairs, until I started to notice a pattern of problems I would run into, writing parsing and validation routines to ensure correctness of a file.
In Jeff's example how do you denote multiple fruits in the same file? You have to create rules in your parser to say 'Start a parsing a new fruit each time you see fruit as a name'.
How do you deal with a fruit having multiple toppings defined? Is the first one correct, is the last one? The default behaviour (depending on code) is to use the last value so you will survive if you don't think of this one. But this brings up another problem..
In Jeff's example how do you deal with having multiple 'bugs' on a fruit. Do you number them?
fruit=pear
vegetable=carrot
topping=wax
bug1=fly
bug2=aphid
If you number them you have a maintenance problem. If you just leave the name as bug, then you have to write a specific rule saying 'if the name is bug, creating a new bug on my fruit' otherwise you will only store the last bug.
XML can solve these problems by using a Schema and letting your language xml library code parse and validate the input.
But like Jeff said in the podcast, use each tool for the proper job.
Craig on June 25, 2008 09:21 AMI think Jeff said it best (to summarize):
This is not a religious debate until you encounter the comments, Jeff is only pointing a finger at the elephant in the room and saying: "Check it out people!"
Not "PEOPLE WATCH OUT YOUR GOING TO BE TRAMPLED BY IT"
So go look. Spit at it if you will, but shouldn't we all understand the childhood saying "Don't judge a book by it's cover"?
I like Fanboy?'s final comment:
"I'll stick with Xml. It works, so I don't have to."
And I will extend the statement to include:
"I'll stick with an open mind. It expands, so I don't have to worry about missing something great in exchange for contentment"
!Kaizen people! http://en.wikipedia.org/wiki/Kaizen
Sorry, but there is no sacred cow that has as much fervent devotion as SQL. Try to convince a group of DBAs that SQL is a bad language (which it is) and watch for unidentified flying objects.
Which should make tonight interesting for me. I'm going to a SQL Server usergroup meeting. :)
Chris Brandsma on June 25, 2008 09:52 AM@Chris Brandsma
Great comparison! And similarly to XML, hundreds of programmers have decided that SQL is teh suck and have decided to wrap it behind a construct they like. In the end it's the same thing, some people want to work with tool x, so they do, others want to avoid x at all costs so they avoid it or wrap it or find a wrapper to make it "safer/easier/better" to deal with. Those that have embraced tool x look at those who use the wrappers as lazy (or terrorists), while the anti-x crowd sees the other group as stodgy troglodytes.
SteveJ on June 25, 2008 10:00 AMI like XML for documents. You know, those things with lots of words and a little markup. Periodically I see someone make the groundbreaking rediscovery that XML can be just as well represented as S-expressions, and then instantly try to apply that to HTML. Which is silly, because if there's one place I *want* those big ugly redundant tags, it's in the middle of a document where it's otherwise easy for them to get lost in the noise. If I opened a tag five hundred words ago I don't want to have to flip back to the beginning just to see what tag this paren or close-bracket or whatever is closing.
On the other hand, using XML for key-value pairs is equally silly. Especially when it's a file that doesn't need any kind of well-defined i18n story.
Avdi on June 25, 2008 10:27 AM"Why don't we think about what we're doing?"
Because there is a very real mental cost to thinking about it.
"But, we also tend to radically underestimate the impact of the dozens of small events in our lives throughout the day." Like missing a small detail in the requirements doc (we'd be so lucky!) that emulates the flap of a butterfly's wing?
"I highly recommend reading it..." Nah, mentally pricey!
"That's all I'm saying." Glad to hear it.:-)
XSD rocks!
Do you honestly think that this issue (naming of tags) *wasn't* discussed when XML was being standardized?
Rather than guessing what the XML standard bodies think, or simply assuming that they're just *not thinking* because they did something you find odd (how could they be so stupid as to not agree with you, after all), it might do to go figure out what the rationale behind the design decision actually was.
I do know this much: without named closing tags, it makes it much harder to verify the structure of a document until you reach the end of the document, and the parsing error is almost always going to show up later than the *actual* error.
Therac-25 on June 25, 2008 10:47 AMLet me toss my cents into it.
First, XML wasn't made for humans or computers. XML is derived from SGML, which was designed to be a "Standard Generalized Markup Language". Ie, it was supposed to "markup" (add metadata to data), and to be sufficiently generic to be able to handle anything whatsoever.
XML was derived the following way: ok, we have this very flexible thingy. It's too flexible and complex to use. Let's get a subset which will be able to handle hierarchical data, which will solve a specific subset of problems we have.
For some reason, no one is allowed to do that anymore. Ie, say it's too flexible and complex.
Frankly, if your answer is "XML is here to stay, deal with it", then go do anatomically impossible things to yourself. Not everyone is happy with XML and XML will not pervade everything. Deal with it.
If your answer is "if you think this is too complex, add more complexity (eg, specialized tools, XSLT) to hide the compl