May 11, 2008
Everywhere I look, programmers and programming tools seem to have standardized on XML. Configuration files, build scripts, local data storage, code comments, project files, you name it -- if it's stored in a text file and needs to be retrieved and parsed, it's probably XML. I realize that we have to use something to represent reasonably human readable data stored in a text file, but XML sometimes feels an awful lot like using an enormous sledgehammer to drive common household nails.
I'm deeply ambivalent about XML. I'm reminded of this Winston Churchill quote:
It has been said that democracy is the worst form of government except all the others that have been tried.
XML is like democracy. Sometimes it even works. On the other hand, it also means we end up with stuff like this:
How much actual information is communicated here? Precious little, and it's buried in an astounding amount of noise. I don't mean to pick on SOAP. This blanket criticism applies to XML, in whatever form it appears. I spend a disproportionate amount of my time wading through an endless sea of angle brackets and verbose tags desperately searching for the vaguest hint of actual information. It feels wrong.
You could argue, like Derek Denny-Brown, that XML has been misappropriated and misapplied.
I find it so interesting that XML has become so popular for such things as SOAP. XML was not designed with the SOAP scenarios in mind. Other examples of popular scenarios which deviate XML's original goals are configuration files, quick-n-dirty databases, and [RSS]. I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended. In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today.
Given its prevalence, you might decide that XML is technologically terrible, but you have to use it anyway. It sure feels like, for any given representation of data in XML, there was a better, simpler choice out there somewhere. But it wasn't pursued, because, well, XML can represent anything. Right?
Consider the following XML fragment:
<name>The Whole World</name><email>email@example.com</email>
Dear sir, you won the internet. http://is.gd/fh0
Because XML purports to represent everything, it ends up representing nothing particularly well.
Wouldn't this information be easier to read and understand -- and only nominally harder to parse -- when expressed in its native format?
Date: Thu, 14 Feb 2008 16:55:03 +0800 (PST)
From: The Whole World <firstname.lastname@example.org>
To: Dawg <email@example.com>
Dear sir, you won the internet. http://is.gd/fh0
You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision. I hope one day our great-grandchildren can live in a world like that. Until that glorious day arrives, I'd sure enjoy reading text files that don't make me suffer through the XML angle bracket tax.
So what, then, are the alternatives to XML? One popular choice is YAML. I could explain it, but it's easier to show you. Which, I think, is entirely the point.
<White refid="fritz" />
<Black refid="kramnik" />
<White refid="kramnik" />
<Black refid="fritz" />
Vladimir Kramnik: &kramnik
Deep Fritz: &fritz
David Mertz: &mertz
There's also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.
You could do worse than XML. It's a reasonable choice, and if you're going to use XML, then at least learn to use it correctly. But consider:
- Should XML be the default choice?
- Is XML the simplest possible thing that can work for your intended use?
- Do you know what the XML alternatives are?
- Wouldn't it be nice to have easily readable, understandable data and configuration files, without all those sharp, pointy angle brackets jabbing you directly in your ever-lovin' eyeballs?
I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to the XML angle bracket tax. <CleverEndQuote>Again.</CleverEndQuote>
Posted by Jeff Atwood
What I was trying to say (using square brackets):
From="The Whole World firstname.lastname@example.org"
]Dear sir, you won the internet. http://is.gd/fh0[/message]
Java is this big exception here, and I think Java has a lot to do with XML's popularity. With no good way to have anonymous data structures in Java, embedding data in your application is just not possible, you have to store it externally.
You're probably right here, as PJE explained in his "Python Is Not Java" (http://dirtsimple.org/2004/12/python-is-not-java.html),
This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding". In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)
If you are a Java programmer, do not trust your instincts regarding whether you should use XML as part of your core application in Python. If you're not implementing an existing XML standard for interoperability reasons, creating some kind of import/export format, or creating some kind of XML editor or processing tool, then Just Don't Do It. At all. Ever. Not even just this once. Don't even think about it. Drop that schema and put your hands in the air, now! If your application or platform will be used by Python developers, they will only thank you for not adding the burden of using XML to their workload.
I'd also really like to know why angle-brackets are so much worse than the newlines and indents required for YAML, and the quotes required for JSON, and how those languages provide data type decoration, schema validation, and declarative transformation.
Many systems use XML simply for the sake of being buzzword-complete. If you're just transferring small amounts of information, using XML will indeed ruin your signal-to-noise ratio. But sometimes the structure of the document is a part of the message. If you need to exchange complex data structures with someone, you could certainly do worse than to send an xml schema and/or a sample document. The developers will quickly understand the general concept and the validator will make sure that the details are correct. Of course, in a perfect world everyone does unit tests and data validity will never be an issue..
This is exactly like the discussions that arise when someone decides normalizing data is unnecessary, and have invented some new way to add array data types to SQL.
I have a question for "The Postindustrialist":
You say it's not worth coding unless you can code it by hand. What about images? What about databases? You can't code JPEGs in a text editor. You can't code MySQL rows in a text editor. That's because they're optimized for their domain, which is what I contend most data should be. Joel Spolsky has a great blog post (http://www.joelonsoftware.com/articles/fog0000000319.html) where he goes into that. Search for "Quick question" at that URL to jump to the place I'm talking about. I don't see why we're taking such a performance hit in the name of being able to edit something in our favorite text editors. Who cares? Why is that important? Why is it okay for images and database rows to have their own optimized editors, but not other kinds of data?
If you'd have said that about ASCII 15 years ago I'd have said the same.
If for nothing else, XML gives you an encoding standard - We aren't all American.
If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
If the overhead is too much for your bandwidth I'm sorry.
It seems to solve more problems than it creates.
Oh, and it's human readable too (even if it does irritate some folks)
By the way, Brianary, you can display HTML in your comments by using amp;lt; instead of and amp;gt; instead of .
@Max: That's HTML. It says "no HTML".
(No one reads this far down in the thread do they?)
When XML was first introduced I often said that it was a great way to expand your data by a factor of ten. Binary formats are often way more compact, except for text. Binary formats are also way more difficult to work with. All in all I'll take XML over the old binary formats just about every time.
The conversion from binary formats to XML was the big win.
Comparing XML to JSON, YAML, and so on is a useful exercise, but let's keep the larger historical perspective. We are already hugely better off than we were a few decades ago.
Thanks for this post. I didn't know about YAML or JSON. Both standards are pretty interesting.
I wholeheartedly agree with all you said. However, 1 point was not addressed: availability. I often choose inferior *ubiquitous* technology. Support, community, and 3rd party enhancement make life easier and development quicker going with the most common rather than going with the best. This gives me a greater range of choice.
My mp3 player is an iPod not because it's the best but because I wanted a wide range of options for durable cases and you can't find that with any other brand. I use Windows XP because I can find a free, and usually open source, version of a tool to perform any common task. These tools also are not the best of breed but I don't care. They get the job done and I go on with my life with more time and more money than if I demanded the best of everything.
I often choose to use XML for the same reason for many things even though it's a lousy choice (esp with data). However, all of my co-workers can look at my code and know what it is immediately. We can also use an all but infinite number of 3rd party enhancements later to modify the XML.
I've always disagreed with these types of arguments. Sure, there are cases where xml is abused. Packing csv into flat xml structures is usually bad, and I think we've all seen the case where someone serialized a binary by putting one byte per xml element in a huge xml file.
But that isn't the whole story. Your email example does it for me. If I use xml, I can use an xml parser that will always parse valid xml correctly, and I just deal with the data. If instead, I invent a spec for a format like, then I have to write my own parser. That looks incredibly simple, but you also have to remember all of the variations that an email header allows (splitting lines at certain columns, quoting names instead of leaving them unquoted, etc).
With XML, I just say el.InnerText and be done with it.
Similarly, that chess example looks like the xml would be hard, but if it had a schema, my editor would hide virtually all of that from me by giving me dropdowns for the elements that are completely context dependent. You could theoretically do the same thing with YAML, but the editors are not nearly as mature.
So in the end, by criticizing xml, you've really vindicated it. Good job.
As long as every major language has libraries to allow reading/writing XML in both node-by-node and XPath modes, I'll take XML over any other format out there. Why do I want to spend my day writing custom parsers or trying to understand yet another file format? I'd rather save my mental energy for understanding the semantic content rather than the presentation. For simple key/value pairs or tabular data or pure text I don't know that I would choose XML, because parsing those formats is dead simple... But in most other cases, why not? Because a few examples can be found where XML was used badly? Hardly persuasive. That's like arguing that delimited flat files are bad because comma delimiters can be problematic when your data includes commas.
And the above comment is a vaguely perfect example of why I hate XML.
Every second time you try to do anything with it, you get a fundamental character clash with... the internet... and for some reason, XML errors are really rough to sort out.
Because each variable name needs to be declared twice - one at the end, one at the beginning you often get errors miles away from where they'r actually happening... they're basically nested bracket errors which (when things get complex) are bastards to sort out.
I've being saying that XML sucks for years - yea it's the best we have (because so many people use it) but it's still really verbose and pernickety.
XML is the greatest thing to happen to data since databases. I believe that its benefits greatly outweigh its disadvantages. Show me any alternative which can be easily human readable, easily machine readable, whitespace-independent, allow comments, allow escape characters, represent serialized objects, platform-independent, represent web pages (XHTML), allow standard configuration, allow ordering of elements to be ignored if applicable, or not ignored, depending on the context, provide a simple set of basic rules, easily translatable into other formats (XSLT, CSS, etc.), and be accepted throughout the industry, and I'll gladly consider that. Until then, I'm perfectly happy with XML.
Anyone who argues that "XML was never intended to be human readable" (which doesn't include the author of the article you link to; his thesis is rather different) has not made recent enough reference to the design goals of XML
one of which is "XML documents should be human-legible and reasonably clear."
Obviously some implementations fail to meet this goal.
For the record, I accept a lot of your points, but the Churchill quote is still apt - basically we don't ever want to go back to a pre-XML world. People are constantly inventing techniques which are superior to XML for some things; that's good too.
Lisp has had this problem solved 50 years ago, with S-Expressions. There is a reason the Lisp developers of the time refused the more "natural" M-Expressions that were planned for a future version of the language.
What I hate about XML is the suite of "standards" that accompany it: XSD/DTD [for data integrity], XSLT [data transformation language], and XSL [xml stylesheets], XSL-FO, WDSL, and SOAP. It wasn't designed for the KISS-minded folks, but rather 'standards-body' type people who like complexity for the extra job security.
And XML was driver for creating a new industry of "Service Oriented Architecture" and made XML really fat and clunky. I have actually used XML for document authoring and it wasn't pretty since it used XSD/XSL/XSL-FO to generate an XML document to PDF!
XML in its raw form of simply tags and brackets, works well only for information that needs to be represented in a hierarchical, tree-like, nested format. That is XML's expressive power at its work.
But for most of the other information out there, such as passing information along, you can use a more flattened structured format that is more compact, easier to parse by algorithms (XML parsing is SLOW because of it's recursive, tree-like structure), and you don't need to depend on third-party tools to do it.
We are about to standardize all of our persisted text format configurations into YAML here.
Personally I have switched to YAML.
That includes all my config on Linux too.
I use ruby to generate specific formats if need be (including XML).
I would never again in my life go back to writing XML with hand, and I refuse to stick to XML for _humans_.
Die XML, DIE!
They now say that XML was not meant for config files etc.. but it is a huge lie.
XML was meant to bloat the world and annoy users.
Great post Jeff, I agree 100% whole heartedly!
I usually agree with you, but not on this one.
The company I work for used to write text file parsers for damn near everything - until I introduced XML as a way to organize our data. Now our data streaming code consists of no more than 50 lines to read all sorts of data. That's just one example. XML has streamlined and simplified our work in other ways as well.
Like any other technology, XML has its time and place. If used correctly, it can be a godsend. Used incorrectly, and it can be a bane.
And honestly, it sounds more like you have some frustration toward lazy and undisciplined programmers than XML itself. Scripting, YAML, JSON, and whatever else can all be misused in the wrong hands.
You know this, too. Any tool can be dangerous in the wrong hands.
(date :year 2008 :month 02 :day 14)
(from :name "The Whole World" :email "email@example.com")
(to :name "Dawg" :email "firstname.lastname@example.org")
(message "Dear sir, you won the internet."))
I love the modern developer thought process. If you proclaim s-expression programming languages are confusing and too hard, you're a whiner and need to leave programming to the big dogs. If you proclaim that XML is confusing and too hard, you're an enlightened realist. Yes it is abused, but it works and the structure is relatively unambiguous, especially compared to the thousands of arbitrarily different pseudo-hierarchical INI/text formats on any UNIX machine. It's funny because you can see where they started out with a simple value pair, then suddenly oh crap, we need simple structures, tack it on. Except everyone does it differently. I don't like parsers, I don't like reading your crappy file format that's allegedly better than everyone else's. I hate your tab-delimited hierarchical hack of CSV and I hate your special version of INI. I hate Makefiles, they are ugly and the syntax is stupid. I hate working on parsers written by programmers that think they're too smart to use XML.
Yeah, SOAP is crap, but ANT is not that bad, it's an improvement over Make if you just suck it up and learn something different. XML is verbose and it's slow to parse and DTDs are stupid. But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
This is not directed to Jeff, but to some of the commenters. But seriously, the XML in that YAML comparison was practically transparent to me, I don't see what the big deal is. And I work with XML file formats that would make your hair fall out of that bothers you. And the email example, it may look good onscreen but I seriously do not want to ever have to maintain the black magic in Sendmail or try to write a functionally correct mailfile parser. The devil is in the details, which is exactly what is wrong with "simple" file formats.
No one will read this, but here goes.
Those decrying make as stone-age are sadly mistaken. That a tool written in the 70s still kicks the ass of others is testament to its power. My project ditched Ant when the build file and it's includes began reaching 200+ lines and *still* couldn't meet all our needs. Replaced it with a 30 line make file that can be easily parameterized( Ant's method absolutely blows ), and calls out to virtually whatever we need, from scripts to full-blown programs to execute tasks impossible with Ant.
Make. It's not like Ant. It's superior.
You say it's not worth coding unless you can code it by hand.
I don't think I've seen anybody say this. The problem is that XML was started as being "handcoded", and even today "XML-aware" tools mostly aren't much better than handcoding.
What about images?
Depends on the images, SVG are easily hand-coded. But most images are fairly opaque binary formats with quality data-specific editors and deal with concepts that aren't and never have been hand-coded, for most.
You can't code MySQL rows in a text editor.
Er... yes you can, it's called SQL. Or, when dealing with import/export, "excel", "CSV" or even "JSON" (the Django web framework has a data import/export to/from databases and its default serialization format is json)
That's because they're optimized for their domain, which is what I contend most data should be.
And which XML is (document markup), except the vast majority of the uses of XML are far outside the domain it was initially created for.
If for nothing else, XML gives you an encoding standard - We aren't all American.
Yes, and that's indeed important. I agree.
If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
Actually, no it doesn't. It merely gives you half the parser (the structure), but it doesn't give you the syntactic parsing. In fact, JSON and YAML parsers give you much more on the parsing side as they also give you syntax (e.g. JSON has numbers, strings, arrays and maps/dicts/hashes and they're usually translated as such in your language of choice by the parsers)
Like most creations by developers (e.g. C++, Assembly, Lisp, etc.), it is a human factors disaster.
Would you be kind enough as to explain what part of Lisp was a disaster? And why?
Why do I want to spend my day writing custom parsers
Fail, both alternatives suggested here have parsers for pretty much every popular and less popular language out there.
The thing about XML is that every language has a solid and well tested parser for it. Sure, it's not hard to parse other formats, but XML is nowadays built in anything, so it's a bit like a "reverse chicken-egg problem". That also means that for interop, you only need 1 parser in your application. If I would need one YAML Parser, one XML Parser, one .ini Parser in my Application, then I am creating redundancy in a certain way.
With current Disk Sizes, the extra space for storage should not be an issue (and if it is, there is surely a good way to store the xml file compressed), and for the Web, i'd rather see everyone turn on gz compression on their web servers first, before they switch to another format.
In short: XML is not the best tool, but it is one tool that works really well in most situations.
If you are writing your own XML parser, then you are doing something very wrong.
XML was built with totally different design goals than JSON/YAML and unfortunately people have brutally misused XML as anyone that has used JavaEE pre Java 5 knows.
Yeah, SOAP is crap, but ANT is not that bad
That's probably why even Ant's creator said that, in hindsight, using XML was not such a good idea (http://weblogs.java.net/blog/duncan/archive/2003/06/ant_dotnext.html)
it's an improvement over Make
Pretty much everything is an improvement over make (and, really, Make isn't that bad until you have to go and use auto*).
But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
Which have been advantageously replaced by annotations that are cleaner, clearer and simpler to read. And not separated from what they're talking about.
XML fails for me on two counts.
(1) XXE -- I can't write any specification saying the input is well-formed XML because of XXE attacks ( http://archive.cert.uni-stuttgart.de/bugtraq/2002/10/msg00421.html ). I would have to say the service accepts "well-formed XML minus the inherently unsafe features that make document interpretation dependent on whether it is parsed inside the wirewall or outside."
(2) XML (with attributes) is not extensible.
XML makes an arbitrary distinction between nodes and attributes. Many recommend treating nouns as elements and adjectives as attributes when describing an XML schema, but the language doesn't allow for adverbs. If I start with a document like foo description="human readable text"/ and later realize that I need to annotate description with the locale of the text, I'm screwed because I used an attribute to store the description.
@macklinn got it right. XML is the reductio ad absurdum proof that the Wisdom of the Masses is false. There is no There, There. It's just text. Parsers are only a *requirement* to extract the data, so that your Custom Built program can do its thing. As Fabian Pascal said:
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged — semantics — and once they do that, there is no need to repeat the tags in each and every record/document being transmitted. Any agreed-upon delimited format will do, and the criterion here is efficiency, on which XML fares rather poorly...
I have to agree - I'm no fan of XML, and never have been. I can appreciate that it's a clever way to express any kind of information, and separate it from any kind of presentation, but it's just hard to use.
Especially when you add the complexity of XML namespaces. Writing a concise XSD file that people will find useful is a nightmare, and that's if you have tools to help. What I want is a simpler version of SQL Server express, that doesn't need an install particularly. Text files might not be it at all. In fact, look at the way Outlook stores local data in pst files.
But hey, I have to say that a simple data design, when applied unsullied, is my preferred method of storing configuration. Not the default Settings providers, you understand, but plain old serialisation.
From my experience with config/data files, the majority of the time processing them is figuring out how to parse them. Escape sequences are a pain, picking delimeters and agreeing on a format are time consuming operations. XML is a real win here because it gets rid of these problems and allows you to quickly get config/data files running.
The other big win is it's generally well understood at this point. Even for people who don't deeply understand XML it's pretty easy to understand at a glance how the data is structure. This may have more to do with the amount of people who have done at least minimal HTML rather than intuitiveness.
I completely agree this can be overkill at times but IMHO it's an acceptable default solution.
I find discussions like this very helpful. When I have to staff up a project, one of the hardest things to do is weed out people who are going to be useless. There are an awful lot of arguments in here that decompose into "X is stupid because I never learned how to use it properly," or "X is stupid because someone else used it improperly." It never occurred to me that you could get people to gleefully reveal their unemployability just by giving them a half-thought-out rant about a subject and seeing how eagerly they agree.
It always amazes me that the same people who love XML are the ones that deride Lisp for all its parens.
Seems like a non-topic to me, there is no one "be all and end all format " that neatly meets the needs of all constraints that it may ever be adopted within, so this entire topic seems moot to me. Actually I'm not even interested in discussing why XML is so widely adopted like it is today, that's another pointless discussion.
What really interests me is why you feel that XML is used sometimes inappropriately. XML is always chosen for a reason. Even if it is simply an economic issue that is a valid reason. Even if it is simply because "programmer x only knows XML" (which I really doubt is ever a real world case) that is a valid reason to use XML. Although real-world is never this simple, there are always several reasons for any decision both explicitly and implicitly chosen.
Always design with all important constraints considered. If your design requires that people need to quickly and easily read the contents of remotely transmitted messages, then investigate commonly remotely transmitted messages. Don't re-invent the wheel. Find the solution with the fewest deficiencies possible from your constraints. A good architect doesn't just review one small aspect of a solution without looking at the solution's design from a macro-level and also without looking at the solution's compliance to its requirements. Sometimes perhaps XML implementations are simple idioms though and maybe in those cases things can be micro-reviewed.
I suppose if you don't like XML you could spend a decade inventing your own syntax, writing libraries for it in every language and inventing a whole host of auxillary tools and specs to work with your proprietary syntax. This would be an enormous, extraordinary waste of time but it might make you feel warm and fuzzy on the inside.
An XML configuration file just seems like taking a simple INI file and making it as confusing as possible.
I just completed a small project that uses INI files. The application requires that the user manipulates the configuration files and INIs seemed like the easiest way to make that happen.
I used Todd Davis's code from CodeProject: http://www.codeproject.com/KB/files/VbNetClassIniFile.aspx
There is also a Open Source INI library on SourceForge called Nini: http://nini.sourceforge.net/
"I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended."
That quote still has me scratching my head. I have been under the impression for the past 10 years that the entire purpose of XML was to define the structure of data used and generated by disparate systems so that those disparate systems can work together. If that's not the primary intention of XML, than what, exactly, is?
...or did I misunderstand the quote?
XML is celebrating its 10th birthday and you decide to spoil the party. :)
XML has never been a silver bullet - it's has some negatives like its many alternatives. The issue is with developers who adopt it inappropriately.
I happen to love XML (and XSLT), but feel I know where and when to use it.
I like pointy brackets too, but I also like curry for breakfast.
XML has its place but I still use .ini files for storing settings. They are smaller than XML files and parsing them is easy. A lot of people see me writing apps which user ini files these days and are shocked I am not using XML. Sadly when I explain they cannot get around their mindset that XML files are the answer to all problems.
Also I am glad you mentioned YAML as it is awesome :)
I completely agree with Jeff.
I'm not totally against the use of XML. But as Jeff pointed out, its use can be avoided in certain places like configuration files, quick databases etc. XML is very taxing when it needs to be used with the mobile devices where the data transfer is painfully slow and pricey.
For example, when implementing XML centric protocols like XMPP on mobiles, it takes more than 800-900 bytes to send a small text message across. Same is the case for web services. Average amount of data transferred while executing a remote method is anything between 2 to 3 kilobytes. That's really insane as the 70-80% of the payload is noise.
XML not only demands more bandwidth, but it's CPU (and in turn battery) demanding when parsing on low power devices.
What is with all this XML bashing? Maybe you guys are all form the old generation of programmers, who are use to using unstructured documents. XML has many advantages:
-arbitrary data structures
-easy to manipulate
Good. Freaking. Lord.
An idiot with the best hammer available will still build a crappy house.
Get past the tools already.
I'd still rather store things in XML than the Windows Registry.
Maybe we can get back some day to using INI for storing program configurations.
I mostly just use XML for RSS feeds... might not be the intended usage, but it works pretty well. I always found parsing the things a pain though.
XML is a compromise between readability of information and efficient storage of data, and, like all compromises, can't be perfect on every aspect.
As you say, it's also the foundation for the industry-standard SOAP.
And, for readability, there are always alternative ways of visualizing XML, like viewers, etc.
I have no great love for XML. And there's definitely some horrible misuses of it, SOAP among them. But outside of the world of Unix it's largely replacing binary stores, and for that I'm happy.
I don't think XML is the best default choice, but I'd much rather have developers start with a text format and only move to a binary if really necessary. And if XML is the only text option they'll consider, bring on the XML. (Which text format is a bigger question: CSV is very useful when you're dealing with lots of similar-formed tabular data, YAML is great for a simple things, and binary is still the fastest and smallest. I haven't used JSON yet, so omitting that out of ignorance rather than merit.)
Tangent: The "Less efficient!" argument against all human-readable formats annoys me. Does it really matter in your application? If it does, go ahead and pick something else. But stop using that as the central point of any argument, as if it's always worth trading ease-of-use for efficiency.
Ack. I edited my comment above before posting and it looks like I was saying binary is a text format alternative to binary. I just meant to say that text formats are great, pick the right one, and binary formats still have their place.
Yeah, I agree. Next time I'm just going to save my data in compressed binary or deep into a database or force it into flat property files like almost everyone did before XML. This XML stuff sucks.
I don't understand why anyone would actually deal with raw XML, it's not really a human-readable format...
Not that I'd wait for a generic XML editor that does everything for everyone--but if you are a programmer supplying a system and you ever tell users to bring up their editor and edit the XML file, you're doing it wrong.
@Nikhil Belsare: That a great rationale for only using gopher on handhelds!
Can't say I disagree with this post more.
The reason XML is so great for everything is that it is ubiquitous. There’s something to be said for not reinventing the wheel with every application. There’s something to be said for using built-in .NET or Java libraries (Dude, have you seen what cake it now is with Linq2Xml?). Sure it’s inefficient; sure it’s verbose and ugly, but I know how to do it with a small amount of code and with few errors. Why would I now learn the quirks of YAML or JSON? I’ll let you spend the time learning and struggling with the optimal platform to create websites with; I’ll spend the time making a cool website that is slightly inefficient.
I actually use JSON about 95% of the time and XML only when I absolutely must, usually because of backwards compatibility.
That said, it doesn't really matter that much in the end. It's about what you can build not how you build it.
I'm sooo sick of these "we hate XML cos that's wot all the clever dicks on slashdot say" rants. And I've got news for all you Ruby / YAML folk too:
1) YAML sucks. It's really, really poor. The only reason it exists is to attempt to replace XML, which kind of makes it a non-thing in it's own right.
2) Ruby is a slow as a dog, and ain't getting faster any time soon. ROR has reached both it's bullshit and performance threshold.
You people are so blind to what XML has achieved, and Dereck Denny-Brown displays this perfectly...
"...In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today..."
Erm... have you ever heard of the INTERNET which uses this stuff called HTML which is, well, to all intents and purposes.... XML?!
If you don't like the markup/data ratio in your XML documents USE SMALLER ELEMENT NAMES!!! How hard does that sound?!
All this bleating about padding out a few text files is completely stupid - modern hard drives are enormous compared to the size of these files! What makes this matter even more amusing, is that it's the Ruby crowd trotting out this argument more often than not. Ruby is so incredibly slow that I can't help but deride any Ruby fanatic moaning about "bloat" in XML.
And finally, the biggest, dumbest buzz-word 2.0 compliant whine of all - YOU SHOULD ALL USE JSON COS IT'S GREAT! What?! You want me to build a distributed application that communicates over a network via. strings of code which get evaled on the client?! Does this sound like a security disaster waiting to happen, or what?!
One interesting thing about XML happend in the Java\Java EE side. Back in 2000-2005, almost every framework and spec started using XML for configuration. The main reasons were: 1 - its human readeable; 2 - its tooling friendly; 3 - its portable.
Unfortunely, the (ab)use of XML made Java\Java EE development complex.
Fortunely, two things happened that saved the day:
1 - Rails come in with its convention over configuration, dynamic language and other stuff, and it shaked the very foundation of the Java\Java EE development.
2 - Annotations where added to the languages.
Nowdays, almoust every new framework and spec rely on Annotations+Convetion over Configurations.
I have been saying this for a long time. I always thought why couldn't we just use the java properties file or something less verbose and error prone.
Lua programmers laugh at this entire discussion.
I agree with the pro-Lisp commenters. There are some really good Common Lisp packages for generating XML. xml-emitter for example. Needless to say (to anyone who knows Lisp at least) it is a million times clearer and easier to revise data in Lisp's syntax than it is in XML.h
.NET programmers laugh at you Lua programmers, and wish you the best of luck in your job search.
All true, but then John's comment (expressing utter indifference) is correct too. The badness of XML is more of an insult than an actual problem, or, to put it another way, if XML is your biggest problem then you're doing well.
My theory is that XML's insulting badness is the reason for its success; since it's practically impossible to write a correct XML parser from scratch, everybody uses existing libraries. Result: actual working standardization.
“High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day… shouldn't we be using a data format that's as thin and possible? Shouldn't the common symbols in a data file be encoded and compressed within the file itself?”
I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?
I am in total agreement with Robin and those others defending XML. This is the most ridicules argument that has ever come about in the IT industry – and it’s just painfully tiring – and it really separates those who have had their eyes open during the evolution of the technology and those that, well, slept through it or simply weren’t there. XML is as simple as you want it to be and as complex as you may need it to be – don’t confuse or punish XML because of how people have applied it, E.g. SOAP. That would be akin to reading Mein Kampf (in German) and concluding that German is a horrible and evil language.
I’m not going to reiterate all of the very strong points made pertaining to the benefits of XML above - XSD, XSL(T), DOM, Namespacing etc. I do have a couple additional points to make however.
If, as an example, JSON or YAML or whatever, were to take hold as the industry standard today, in 10 years, I can guarantee you that we’ll see the same silly arguments against it because it will have matured to a full blown multipurpose markup language for data packaging with all the potential complexities found in XML – why? - Because it needs them for the times they are needed! And do we really want to start over again and go through the evolutionary pains with another general-purpose specification strategy – I don’t unless you can show me some *very* strong arguments.
And for those who say XML is not human-readable - hmmm... that truly scares me and I’m not sure I’d want those people writing code on any of my projects. There are things in our industry far more complicated than a few angle-brackets. Latin is not human-readable if you don’t know Latin – so learn it or get out of Rome.
I want a single standardized syntax to represent data – period. And I want to use that same syntax to describe it and that same syntax to transform it.
Brrr, i realy can't understand why Visual Studio is using these horrible xml-comments.
Sure, they are automaticly generated by typing /// but after that they waste 2 lines of code for some summmary-crap.
Javadocs @-thingys are alot better even if i cant remember the exakt syntax atm.
The real complaint here is that your favorite platform doesn't ship with a lightweight XML interface for humans. Stop using notepad to view large XML files. You wouldn't use a hex editor to read your plaintext, would you?
Thank you Jeff!
I work in the financial services industry. Recently, I had to deal with one of *THE* biggest vendors in the industry, providing mutual fund data. They FTP us fund holdings and other information in the form of XML files - each file is about 2 GB - that's right, 2 GB of XML per file, per day! Drove me crazy trying to understand how supposedly professional programmers from this firm could come up with such a monstrosity.
I hope somebody from that firm is reading this blog.
But then again, if they were into reading such articles, they wouldn't have designed such a system in the first place, would they?
I'm the managing editor of xml.com, so its perhaps not surprising that I may have a different viewpoint on XML. However, I think there are a few things to keep in mind with your post.
1) XML started in the document arena, and is in general at its best in the document arena. Given that there are a large number of semi-structured documents out there that can be rendered via XSLT into dozens of potential output formats when expressed in XML, can be searched via XQuery and can otherwise be manipulated (and validated), its worth understanding that even many XML gurus have repeatedly raised questions about whether XML should be so heavily used in the data domain.
2) SOAP was created initially as a way of getting around the restriction of passing binary content into and out of port 80, and it's structure is considered ugly and ungainly even by many XML advocates. SOAP's also not used in an XML pipeline - it's there primarily as a temporary format for serialization from and marshalling to binary objects for invocation of RPCs, and the fact that XML was used in its expression was unfortunate, at best.
Note that a similar argument can be made for RDF triples, which can be mind-numbing when expressed in XML format. Significantly, RDF can be thought of as hypernormalization of data and why XML was used to encode it is something that has mystified a lot of people - significantly, most contemporary RDF applications prefer to use Turtle notation.
This is more a question of user education than lack of suitability of the tools. I saw this same mindset at work with Visual Basic back in the 1990s, where, because building and designing components was harder, people would create very fragile applications because those were the easiest to write.
I'll have more to say on this point in my own blog post shortly.
"I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?"
To a certain extent, it can be. Apache's mod_deflate (http://httpd.apache.org/docs/2.0/mod/mod_deflate.html) will compress content before sending it to the client, but that's as good as you can get. XML feeds are going to get served to RSS clients via HTTP, so you can't use a compression technique that the client isn't prepared to accept. That means it's mod_deflate or nothing (AFAIK), and I still feel like it's a band-aid solution.
this is a good article in that it spurs people to revisit fundamental arch decisions (which we tend to put in place and leave for years).
I like the Ant example .. because it shows how a somewhat more self describing data format (the tags dont have to means stuff, but if they do it can be compelling) is applicable to a much wider population of users ... I love make, but I couldn't get anyone to use it ... Ant on the other hand I seem to be able to train people to use it ... go figure
that being said, we live in a world (the web) where most of the 'stuff' is made up of markup ... so it makes sense to have a data format that the same toolset ('view source' anyone) can be applied to ... XML is the democratization of data and the sister to HTML.
lets not forget all those scenarios that were 'taxing' before;
* all that marshalling code which brought 'stuff' from databases or some OO class instance into HTML and back; less code means less bugs
* debugging was indirect process versus 'view source'
* sql used for everything (requiring a server environment)
* working with semi structured document data versus working with data data
* YAML is a potential subset that I have yet to work with ... I see it as an optimization and will use it as and when the need arises
I like a lot of data (s exp, sql, xml) and all of them are appropriate to one situation or another; XML just has a lower barrier to 'start doing' stuff then most.
cheers, Jim Fuller
While the angle brackets are a bit of a pain - they are only a pain to the programmer who insists on modifying the files by hand.
You only have to compare it to what went before to realise how easy it makes life.
You use an SMTP message as an example - its a good example because very very few smtp agents manage to correctly parse and generate smtp headers. SMTP took years to get everyone putting the same sort of formats and even then almost all email messages break one or other of the rfcs.... after 30 years noone can generate something as simple as an email message. As a consequence all mail clients/servers are forced to handle exception after exception just to operate.
Try taking a look at the text files your account uploads and downloads to your bank - the formats were badly designed when they first came out never mind now years later. Fields with multiple uses, everyone adding their own extensions and pretty much impossible to revise the standard once released.
Take a look at the FIX standard - a nightmare (badly) designed by committee. If it had been in xml from day one a huge amount of work would have been saved for all.
Xml makes it easier to design a file format which subsequent generations of programmers can work with. It eliminates the waste of time parsing phase.
I can't believe you actually posted on this topic. Are you out of ideas?
Guess what you'll be dealing with 10 years from now, if your (much deserved) success hasn't driven you out of the programming game to greener pastures: X. M. L.
Of all the formats you could take and hold up as "easier" than XML, you single out the RFC 2822 mail format? "Arcane" doesn't begin to describe it.
"Nominally harder to parse", indeed. You definitely do *not* win the internet with that one.
I fully agree on this. XML should not always be the number one choice. XML is great, but only when used in the right context. The worst case is probably the use of XML as a programming language (like ANT, etc.). Allen Holub has written a nice article "Just Say No to XML": http://www.sdtimes.com/fullcolumn/column-20060901-05.html
If one of the concerns is that xml is not good at make information readable to humans ...
You may transform it into any other format which could be more readable :)
Bashing xml for being widely used seems rather upside down. Just one question:
Say you have to take on a project at work. It involves a lot of files with data in them. As you're new to the project, you don't know the format of the files. The previous programmer left in a hurry/got caught with the managers wife/died in a plane crash/was abducted by aliens and had not written any documentation of the format prior to "leaving". Would you rather that a) the previous guy had used xml or that b) he had whipped up his own format for the files, because that was really all that was needed?
And no, you do not get any other choices. This is the real world and anyone screaming "But ofcourse the previous guy would have used this supercool language that I just happen to know so problem solved" will be put over the knee of granny Learnthehardway and spanked into submission.
xml is a good thing - as long as you use it right. Along with everything else (including such things as blogs and comments) they can be used wrong. Does that REALLY come as a surprise?
Ok, XML is effectively a compromise between being doc-oriented and being data-oriented, and for any particular job it is likely to be sub-optimal. But I have to echo the comments regarding history: XML is so much better than the binary and ad hoc text formats that came before.
It gets at least three things dead right - it's Unicode-friendly, text-based and has an open specification. The reason these are significant points is that we live in a global environment.
While many people dislike XML namespaces, they too can be a powerful feature in the global environment. If you use a http: scheme namespace and put an explanatory document at that URI, anyone seeing the XML for the first time can find out more. The ability to demarcate different vocabularies in this way makes reuse of existing specifications a lot easier.
yeah right - the premise here is that XML is meant be human readable. No its not. It just helps that the wet stuff can read it. Its meant for computers. IE if there is an error with the stuff you can check it. Not like the binary stuff.
The simple fact is that xml is meant to help separate out data from format or representation of it. I thought Jeff was supposed to support the MVC model :)
It just useful for that - XSD is just a way to describe the data instance in the XML file.
SOAP is a dialect of XML for the transmission of XML across HTTP. The last time i checked the wet bit cannot handle the HTTP stack and i very rarely provide syn ack to machines.
Remember that this is meant to solve integration of data between systems. aka the the reality of the problem of corba dcom, com and their proprietry formats.
Granted - the XML config files is an observable problem in the way software is now written and lazy teams can't be bothered to offer an editor for configuration of said software. Even MS have gotten lazy see XAML. yuck.
What I hate about Xml:
Someone here also mentioned whitespace. I'd say that whitespace is a killer, and things like Html inside Xml documents also suck badly. Xml didn't quite get it right, and we're still futzing with it. Mainly because Xml was 'meant' to be human writeable, but sometimes whitespace is ignored, and sometimes it isn't, depending. What a way confuse normal people. We need closing tags because, to be human readable/editable, we didn't want people to have to count characters (hence having the closing tags - think about Pascal strings here for a minute). Also the whole 'remember to put it in a UTF-16 if it's unicode, blah blah blah where you can still shoot yourself in the foot for no good reason if someone gets that backwards.
As for Xml, I also don't like that they used such common (English) characters for delimeters and tokens. Xml when it was first concieved, was for structured documents, so why in hell, if it was meant to be 'human readable' do we have ampersand and quotes and angle brackets as the primary delimiters. Escaping and unescaping hell. The entire Unicode namespace isn't taken, and there surely is some new character we could have alotted for this. YAML makes the same mistake, and us coders can't seem to go beyone having "" or '' even though they are so terribly common. D'oh! I vote for the pipe and tilde next time (although that would piss off *nix programmers)
So yes, we can improve Xml. By the same token, if something else is better suited, use it. You don't need to use Xml all the time, so don't. Maybe that should be the real point of the article.
You miss the point that it's an easy to use standard. Xml grew from html, which grew from SGML. Html itself has undergone revisions to make it more portable and less ambiguous. Xml tried to get rid of assumptions and yes, it forces you to be upfront about things, it's verbose, and it's picky.
As a counter example, take a look at image metadata from your camera (EXIF), or mp3 files (ID3) some time. Sick! They may be standard, compact binary formats, but they're sure not easy to understand, and a lot more work has to go into making them backwards compatable. People still have to read big ugly specifications, but they don't have a nice parser to help them. If it was Xml I could easily, in almost any language, parse through the file system looking for all songs by Sting. All I need is 5 minutes on the internet to see that the xpath is /song/artist[@name] and i'm just about done.
Xml, like anything else, has grown since it first came on the scene. Some decisions seem wrong now, but that's the benefit of hindsight. Common sense comes after mistakes are made. It's just the way it is. Yes Xml could be improved. No doubt about it.
Jeff or others, if you're so inclined and sure that YAML can replace Xml, why don't you try to convert an Xml SOAP message into JSON and YAML so we can really see if there's a difference. Something difficult too please, none of this 3 line file stuff. Let's see a real discussion, not just a 'angle brackets give me RSI' post.
I managed to read all the comments and have 2 questions. (1) What are some good XML tools? (2) Are there any recommended books that give good examples on when is the right time to use XML?
You know subversion...they did an improvement in release 1.4
Quote from http://subversion.tigris.org/svn_1.4_releasenotes.html
Working copy performance improvements (client)
The way in which the Subversion client manages your working copy has undergone radical changes. The .svn/entries file is no longer XML, and the client has become smarter about the way it manages and stores property metadata.
All this XML bashing is nonsense.
XML is good. XML is a standardized way that can unite platforms, OS'es and many more entities.
Just to wimp about some 20% overhead on the content length is plain silly. The hardware made jumps higher than 20% in the last... 2 years? and now you start complaining about size, performance and sh*t like this? This is so wrong.
Hey, I'll pick on SOAP. I was a web developer before I was a programmer, and having used XML and investigated integrating SOAP for the purpose of shopping cart software, it's got an awful learning curve for a markup language and it's got a ton of overhead. Standard XML isn't nearly as bad.
Ant is against human rights.
Have you seen binary XML:
It stores XML as binary data, so primitive types only take up their machine representation, and seeking through the DOM is faster because it can be written in binary as a tree structure. So, you get the benefits of "human readable XML" by piping your bxml to a converter, and you get the benefits of just storing the raw data, because it's binary.
YAML is not a markup language but XML is. I think it is not fair to weigh them in the same scale, ther were intended for different usages.
May be laziness of mine but I still use 'Config.ini' files in my applications for the sake of readability.
I also do not use XML as a database because of its 'signal to noise ratio'. For Windows ecosystem it is possible to use CSV or tab delimited text files if you don't deal with high amount of data, it is also possible to run SQL queries on CSV / tab delimited files.
So, where to use XML files? I don't know the answer but I feel safe to know that there is a way to 'mark up' my data when I need.
My group came quite close to using XML for my fourth year project, for, of all things, the configuration file. Luckily, we stumbled upon libconfig (http://www.hyperrealm.com/libconfig/), a nice little C and C++ library for parsing configuration files.
Why not use XML reader tool so that you drag the file into an application that shows the entire thing more human readable?
well, i have finished reading through all the comments and am left a lot more confused about this issue than yesterday. my experience with xml is in trying to get stuff into filemaker pro, and it's a nightmare. I read this post and thought, aha, that's why it's a nightmare, it's because XML has been rubbish this whole time!
Nut after reading through the comments, I now realise this is not necessarily the case, and that I have instead stumbled upon some sort of religious war. So i'll just go back to (un)happily picking my way through xslt scripts.
Maybe somebody needs to tell the iTunes dev's. This is ONE FREAKING SONG in "iTunes Music Library.xml":
keyKind/keystringMPEG audio file/string
keyPlay Date UTC/keydate2008-05-01T13:03:56Z/date
keyFile Folder Count/keyinteger4/integer
keyLibrary Folder Count/keyinteger1/integer
Here's a tip for Jeff,
Get a real BLOGGing software solution - this is absurd. I can’t sit and read through this linear mess. I need something that nests the replies that go together - so I can read those threads that are of interest and discard the idiots – at least with a little more ease. You should consider an XML based persistence model – you can easily nest the messages using this approach. :P
As a side note, if we are going to insist on reinventing the wheel every few years, at least it will keep us all employed rewriting crap over and over and over again. I'm looking forward to CTRYAML - Crap To Replace "YAML Aint a Markup Language". At least the YAML folks didn't reinvent the witty recursive acronym concept taken from GNU (GNU’s Not Unix) – freakin’ brilliant. Sorry - couldn't resist.
I used to wonder that I was probably the only person who used to find it frustrating to understand or even look at big and complex xml pages. Happy to be in the company of so many people. Never knew there were alternative formats for xml. I am going to try them out. Actually I feel the problem is not with the XML itself but with the people who try to over use it.
"So can any regular language, the only advantage this crippled, dumbed down, annoying language called XSLT has over others is that it's written in XML"
I completely agree with blog posting (will wonders never cease), but the above observation is uncalled for and just plain wrong.
XSLT can do a lot in a small amount of code. The trick is to learn how to use XSLT as XSLT, and not as insert your favorite language here.
... and not as your favorite language of the day.
... and not as your favorite language of the day.
"Wouldn't this information be easier to read and understand -- and only nominally harder to parse -- when expressed in its native format?"
Actually, no, it wouldn't. Parsing MIME mail is quite annoying in all sorts of little ways. For example, headers can span multiple lines, headers can use multiple encodings including the totally annoying encoded-word syntax, e-mail addresses are a goddamn nightmare, with their ability to embed comments, contain groups, and all sorts of other mess, and the whole thing is recursive.... e-mail has underspecified parts, it has a bunch of different specifications to read, it's subject to all sorts of meaningless historic legacy constraints; it's really godawful.
The XML is significantly easier. By probably an order of magnitude or two. So, no, not nominally harder to parse, not at all. MIME e-mail is a pain to parse and a pain to generate. And frankly, if you think otherwise, you don't know what the hell you're talking about.
Here's to not reinventing the wheel over and over.
@ CovenantMG - spot-on. Why waste more time on translation of formats. If we're worried about space, forget worrying just about tags and just compress over-the-wire. Browsers do it all the time for html, and so could XmlHttp libs.
Xml is more than just object serializer. You talk about nobody should read Xml, well definitely nobody should read JSON. Try peeking inside a non-trivial object sometime. (JSON should be kept lightweight and not try to replace Xml.)
Neither JSON, YAML or Xml enforce data types, and that's usually the part that sucks. Parsing, interpretation, and agreement of format still need to occur whenever you serialize types. I seriously don't see what the anti-Xml fuss is all about. We all pay tax somewhere.
If anything, the problem with Xml is remembering attributes versus elements and InnerXml versus OuterXml. Sometimes it feels like the code, as written, is very fragile and more verbose than it needs to be.
I respectfully disagree. The readibility of XML, depends more on the grammar used and how it is chosen to be layed out.Not merely the use of an angled bracket structure. Goodness, how do we manage with HTML?
XML is very easy to pass. Was designed as a data exchange language and quite frankly, RSS was an ideal use for XML. So I have no idea what Derek Denny-Brown is talking about. Sure, we could have a list of some kind. But remember, XML is a good comprimise between machine and human readibility.
In short, its not XML at fault. It's how people design the grammars they use. Agreeably some of these are awful, but I don't think we can overall detract from the usability of the format.
I'm of the view that just because XML is in a human readable format, it doesn't mean we should be reading it directly.
The problem as I see it is more about limitations in the tools rather then XML itself.
I continually hear how "cool" the XML editor in Visual Studio is. Sorry, but where is at least a tree view of the document I'm viewing? I'm not talking about the inline exppand/collapse buttons, I mean a seaprate "view" of a document that removes the extra noise you talk about. Try opening a large unfamiliar XML document in Visual Studio and attempt to navigate around it, it's ridiculously diffcult.
It appears Microsoft believe that developers prefer to do almost anything XML completely by hand, that is writing actual xml elements and attributes. This is plain wrong, productivity suffers and human error is introduced.
I use a tiny freeware tool called FirstObject XML editor regularly. It's not perfect however it "cuts the crap" of Visual Studio and XML Spy, jnust allowing you to quickly and easily understand and navigate around large XML documents.