Everywhere I look, programmers and programming tools seem to have standardized on XML. Configuration files, build scripts, local data storage, code comments, project files, you name it -- if it's stored in a text file and needs to be retrieved and parsed, it's probably XML. I realize that we have to use something to represent reasonably human readable data stored in a text file, but XML sometimes feels an awful lot like using an enormous sledgehammer to drive common household nails.
I'm deeply ambivalent about XML. I'm reminded of this Winston Churchill quote:
It has been said that democracy is the worst form of government except all the others that have been tried.
XML is like democracy. Sometimes it even works. On the other hand, it also means we end up with stuff like this:
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<m:GetLastTradePrice xmlns:m="Some-URI">
<symbol>DIS</symbol>
</m:GetLastTradePrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
How much actual information is communicated here? Precious little, and it's buried in an astounding amount of noise. I don't mean to pick on SOAP. This blanket criticism applies to XML, in whatever form it appears. I spend a disproportionate amount of my time wading through an endless sea of angle brackets and verbose tags desperately searching for the vaguest hint of actual information. It feels wrong.
You could argue, like Derek Denny-Brown, that XML has been misappropriated and misapplied.
I find it so interesting that XML has become so popular for such things as SOAP. XML was not designed with the SOAP scenarios in mind. Other examples of popular scenarios which deviate XML's original goals are configuration files, quick-n-dirty databases, and [RSS]. I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended. In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today.
Given its prevalence, you might decide that XML is technologically terrible, but you have to use it anyway. It sure feels like, for any given representation of data in XML, there was a better, simpler choice out there somewhere. But it wasn't pursued, because, well, XML can represent anything. Right?
Consider the following XML fragment:
<memo date="2008-02-14"> <from> <name>The Whole World</name><email>us@world.org</email> </from> <to> <name>Dawg</name><email>dawg158@aol.com</email> </to> <message> Dear sir, you won the internet. http://is.gd/fh0 </message> </memo>
Because XML purports to represent everything, it ends up representing nothing particularly well.
Wouldn't this information be easier to read and understand -- and only nominally harder to parse -- when expressed in its native format?
Date: Thu, 14 Feb 2008 16:55:03 +0800 (PST) From: The Whole World <us@world.org> To: Dawg <dawg158@aol.com> Dear sir, you won the internet. http://is.gd/fh0
You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision. I hope one day our great-grandchildren can live in a world like that. Until that glorious day arrives, I'd sure enjoy reading text files that don't make me suffer through the XML angle bracket tax.
So what, then, are the alternatives to XML? One popular choice is YAML. I could explain it, but it's easier to show you. Which, I think, is entirely the point.
<club>
<players>
<player id="kramnik"
name="Vladimir Kramnik"
rating="2700"
status="GM" />
<player id="fritz"
name="Deep Fritz"
rating="2700"
status="Computer" />
<player id="mertz"
name="David Mertz"
rating="1400"
status="Amateur" />
</players>
<matches>
<match>
<Date>2002-10-04</Date>
<White refid="fritz" />
<Black refid="kramnik" />
<Result>Draw</Result>
</match>
<match>
<Date>2002-10-06</Date>
<White refid="kramnik" />
<Black refid="fritz" />
<Result>White</Result>
</match>
</matches>
</club>
|
players:
Vladimir Kramnik: &kramnik
rating: 2700
status: GM
Deep Fritz: &fritz
rating: 2700
status: Computer
David Mertz: &mertz
rating: 1400
status: Amateur
matches:
-
Date: 2002-10-04
White: *fritz
Black: *kramnik
Result: Draw
-
Date: 2002-10-06
White: *kramnik
Black: *fritz
Result: White
|
There's also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.
You could do worse than XML. It's a reasonable choice, and if you're going to use XML, then at least learn to use it correctly. But consider:
I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to the XML angle bracket tax. <CleverEndQuote>Again.</CleverEndQuote>
| [advertisement] Dashboard for Data Dynamics Reports introduces new controls designed to create dashboards that inform without wasting space or confusing users. |
Posted by Jeff Atwood View blog reactions
« Supporting DRM-Free Music Cleaning Your Display and Keyboard »
Of course, YAML also supports graphs (in contrast to XML, which can only encode trees).
Tom on May 12, 2008 05:47 AMI don't think I could agree with you more on this matter. XML has been abused in an almost-criminal manner.
XML has, to be honest, bugged me from day one. It just isn't at all readable and, despite claims of storage-invisibility, there are just those times when you have to look at the data - XML makes this a truly painful task.
I frequently find myself looking at XML files containing vast amounts of data - whether I'm programming something to parse them effectively, or just trying to work out a data format - and it is forever causing me headaches.
I think the problem stems from peoples' tendency to universally apply something that works well in a particular situation. Take databasing, for example. In my line of work, I so often find situations where people have decided that databasing is so cool, everything should be placed into a database - be it static images, links, whatever. You can have too much of a good thing - and this is just another classic example of the problem.
XML should be used sparingly, and in situations where it actually improves readability, structure and clarity of data. If you're looking for something more complex, you need another way of storing your data: be it YAML, be it JSON, be it another database.
XML is not the be-all-and-end-all, and I think its about time the "average developer" realised this.
James on May 12, 2008 05:53 AMHere, here! I love it when you tell it like it is. Seems to me that far too many people reach for the 'silver bullet' that is XML, then end up with a big pile of mess. The trend for storing data that really should be in an RDB worries me particularly.
"You might argue that XML was never intended to be human readable,"
In fact, I'd argue the opposite! Let's not forget that XML is a *Markup Language* and is best when marking up a document, not when storing 'data', in its strictest sense. Give me a piece of well-marked up HTML, and it's a breeze to read. Give me textual data as, well, text, please!
bobby on May 12, 2008 05:54 AMOh, I forgot to include the canonical HORRIBLE example: ant. Give me a nice readable Makefile any day.
bobby on May 12, 2008 05:55 AMWhy care about the format of your data when we live int he age of inline-able converters?
xml2yaml: http://linux.die.net/man/1/xml2yaml
At least we're not stuck with ASN.1.
Jim McCusker on May 12, 2008 05:57 AMI am building a ruby project that involves AWS. I chose to use JSON for SQS instead of XML because it's lighter and instead of YAML because it's faster (at least from my ruby benchmarks).
You now have libraries for JSON and YAML in most programming languages. Also JSON is nearly YAML-correct (see http://en.wikipedia.org/wiki/YAML#JSON).
Piku on May 12, 2008 05:59 AMWhat I like about XML is that even if somebody uses it badly, at least it's some kind of standard that you can pick your way through. No matter how much of a mess it is.
robustyoungsoul on May 12, 2008 06:02 AMXML is like violence: if it doesn't solve your problem, you're not using enough of it. ;-)
Mo on May 12, 2008 06:02 AMAnt makes me cry.
Andrew on May 12, 2008 06:05 AMI've found that for a rather large amount of what people want to use XML for (what the quoted person called "data scenarios"), you are far better off using CSV. Its much easier to parse, and can be edited and manipulated in the database or spreadsheet program of the user's choice.
There may be a burgeoning market for XML tools, but they have a long, long way to go before they come close to the support available for dealing with CSV files.
T.E.D. on May 12, 2008 06:09 AMHear hear!!
Dave on May 12, 2008 06:09 AMXML and all the tools around it, especially XSL, can take a flying leap. As Wikipedia points out, “the syntax of XSL language itself is valid XML.” As if we were not bloodied enough subjected to XML, now we start cutting off our limbs by using XSL – reminds me of the Black Knight in Spamalot.
Somewhere around Smalltalk seemed to me to be the pinnacle of computing science in its simplicity and elegance. The entire Smalltalk syntax could be represented on a postcard. http://www.esug.org/whyusesmalltalktoteachoop/smalltalksyntaxonapostcard/
What has happened to our industry?
So much of XML runs counter to well understood programming paradigms, too:
http://myarch.com/why-xml-is-bad-for-humans
Jeff Atwood on May 12, 2008 06:17 AMI think the one thing that's missing from that argument, however, is that XML is much easier to validate. If you didn't have xsd, I'd agree with you, but without it there's no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it. In code (as far as I can tell).
And then if your validation criteria changes, you're back to unpicking your predecessor's hokey undocumented parser and validator and then trying to spot-weld in your extra logic. And then recompile. And then (depending on what kind of change-controlled environment you work in) jump through the hoops to get it deployed.
I could well be wrong, though.
Benjimawoo on May 12, 2008 06:26 AM> XML is like violence: if it doesn't solve your problem, you're not using enough of it. ;-)
XML is also, just like violence, something the world could do without ;)
James on May 12, 2008 06:26 AMWell said.
I think that XML isn't really appropriate for many of the applications for which it is being applied. One of the problems is that XML is flexible enough to be turned into about anything, whether that makes any sense or not. There are many good uses for XML out there, but I'm afraid that the many poor uses will prejudice people against it.
Matthew Reed on May 12, 2008 06:29 AMwe are designing an entire reporting system around XML control files. One engine to "rule them all" and small XML files containing everyting we need in the report. So far, it has been a nice solution and we wrote a generator to create the XML. No more digging in code looking for where to change the header width or column order... just load the XML into the generator, make your change, and BAM! instant report update.
Of course it isn't released or even in alpha yet, but "Works on My Machine"
Wayne on May 12, 2008 06:32 AM> There's also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.
There's also another cool thing: JSON is mostly a subset of YAML (there are a few small differences, see http://redhanded.hobix.com/inspect/jsonCloserToYamlButNoCigarThanksAlotWhitespace.html, but it's overall compatible). This means that it's fairly easy to start with JSON and jump to YAML if the structure is too complicated for JSON.
> At least we're not stuck with ASN.1.
ASN.1 is ok, as long as you don't have to create or parse it by hand. But then again, ASN.1 is not supposed to be hand-parsed. And you'll note that XML is the same, it's just that XML is (supposedly) human-readable, and every language has XML serializers and deserializers while Erlang is one of the few languages with an ASN.1 encoder/decoder smack right in its stdlib.
Masklinn on May 12, 2008 06:33 AMXML became the default because of its flexibility in data formatting. And, because it has become so ubiquitous, almost all programming languages have built in ways of easily parsing XML. In fact, I do almost all of my web output using XML and then use XSL style sheets to transform it into HTML. I remember some blogger, can't remember his name, blathering on about MVC and how you should make your output "skinable". Well, if you produce XML output, your webpages are extremely skinable.
The problem is that XML maybe very computer friendly, but is not too human friendly. Most people will easily agree with that. However, there are dozens of GUI oriented XML editors that make reading and writing XML much easier. I've even written a 10 or so line Perl script that converts files from YAML to XML and back. (Yes, Perl. What do you expect from someone who uses VI as their main program editor).
XML is not really the problem. It is an excellent and extremely flexible data format. The problem is our attempt to read and write directly from XML when there are many excellent tools that can help us with the task. After all, you don't expect to read and write Microsoft Word documents using a standard text editor. Why should XML be all that different?
I'm not a fan of using YAML as a data formatting tool because it doesn't go far enough to solving the problem. YAML becomes unreadable when your data becomes more complex and there are very few development tools that can parse YAML files. It's silly to come up with another inferior data format to XML which doesn't really tackle the main issue of human data readability when there are few programming tools to read and write it. You're better off using one of the wide variety of GUI XML editors that can make your task much easier.
After all, how many developers use IDEs to help them program even though almost all programming code is in text and could be done (in theory) using Notepad?
David W. on May 12, 2008 06:34 AMI'm afraid I couldn't disagree more. No, XML isn't the easiest to read (by humans) of all the infinite number of alternatives out there. No, XML isn't the most efficient in terms of space. And yes, perhaps it has been forced into places it was never intended to go. But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information. I would argue the value of having a standard far outweighs the inefficiencies in most cases.
Take a simple example of a configuration file that some application will need for saving user information. We've all been there, making up an ad hoc scheme for saving whatever needs to be saved. Then building a little parser to read and write the data in that form. And over time our little config file grows and changes. Someday, a new programmer joins the team and has to deal with this file. What are the construction rules again? Where can a new item be added that won't break the little parser? How much time has been expended over the life of the application in building, modifying, and fixing that bit of parser code as things needed to change?
There are numerous XML parsers available that are robust and free. They all work pretty much the same way (with a few exceptions that I'd call bugs in implementation). I don't want to write little parsers anymore. I want to use something that is already written and works.
The same argument can be made with respect to the other tools that are widely available to deal with XML-encoded data:
-- XSD can be used to insure the integrity of the XML file *before* your program starts to slurp in the data in the file. This can be critical in B2B situations like banking or ordering from a supplier.
-- XSLT can be used to do arbitrary transformations on the data (in a standard way) to produce files of any format that is convenient on the data consumer's end of the exchange. I do a lot of this sort of transformation work--none of it for web pages--and I can vouch for the power and convenience of having a standard transformation language.
-- XML/XSL authoring and editing tools abound. There are tools that will produce an editable visual representation of a schema (a real boon if you need to capture complex data in a text file). Most of these tools will do much of the work of editing XML files and will help you to construct correct XML with prompts and intellisense-like prompts.
I'm a big fan of XML. No it certainly isn't the very best that we could do but it is a quantum leap better than what we had before--custom representations for everything. If there is one single improvement we could make to advance the art of programming today, I'd vote that it’s STANDARDS. We don't have to wait for perfect standards to emerge (they won't) but we do have to get to the point where we can agree. XML is a step in the right direction.
Thankfully, someone will be implementing YAML into Boost.Serialization this summer.
http://code.google.com/soc/2008/boost/appinfo.html?csaid=BE3EEB904A90B03A
> but without it there's no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it
Actually,
1. Even in XML there are other (far better, especially on the readability front) schema languages/systems than XSD (RelaxNG, Schematron)
2. Schema languages/specs are starting to appear for e.g. JSON (Cerny, json-schema)
3. JSON documents are very often orders of magnitude simpler than their XML counterparts, thus validation becomes almost trivial and often doesn't require a full-blown schema language.
4. Manually parsing and validating a JSON document isn't really hard with a dynamic language.
Thank you for addressing some of my concerns regarding this sacred cow!
"it is rapidly becoming a standard way of representing information"
It is hardly more a 'standard way' of representing information than ASCII (or UTF8, UTF16, etc.) Yes, anyone can write a file with lots of angle brackets, and parsers can easily turn that back into tokens, but the semantics of the file remain application-dependant in almost every example of (bad) XML usage I've ever seen.
"Take a simple example of a configuration file that some application will need for saving user information. We've all been there, making up an ad hoc scheme for saving whatever needs to be saved."
Er, YOU might have been, but the rest of us are familiar with a small number of pretty common configuration formats that are trivial (i.e. easier than XML) to parse.
"XSD can be used to insure the integrity of the XML file"
Yes, for a very limited meaning of the word "integrity".
"XML/XSL authoring and editing tools abound"
And text editors are 'abounder'.
"We don't have to wait for perfect standards to emerge (they won't) but we do have to get to the point where we can agree. XML is a step in the right direction."
OK, if we can get the billion different languages floating around reduced to maybe less than a hundred or so, I agree with you :)
bobby on May 12, 2008 06:48 AMI've also been critical of XML ever since i had to start working with it. I'm coming from Lua where a configuration file is simply a Lua script. If you got an error in the script, you'll get an error message from the Lua interpreter.
Now, if you have the same configuration file in XML format, and NO validation as it is usually the case, you can get a list of problems reading this in your code:
- program crashes
- program says: "error reading config file"
- program starts but uses default settings for all configurable features
- program starts but uses default settings for a subset/single feature
- something else entirely ...
Yes, this is only in the narrow "configuration file" scenario but that's just one where i think XML is totally overused and/or under-validated.
Btw, what ever happened to INI files? ;)
steffenj on May 12, 2008 06:48 AMIt's amazing. For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format, and then the "agile" groups had to mess it up with their JSON and YAML and whatever.
Who cares if XML is bloated? It's not like you're writing this by hand. With a lot of newer technologies like ASP.NET Web Services and Linq to XML, it's practically invisible to programmers. It's just what's going over the wire, that's all. It's also very easy to compress and encrypt, which, if you really care about the size, makes it about as compact as anything else out there.
I realize it's not the most ideal tool for your social shopping cart 2.0 AJAX app. You'd rather use REST. Fine, you can design your own custom web services for that. But most of us in the real world have jobs to do and precious little time to do them in, and if it takes 5 minutes to write an XML Web Service that every other .NET or Java programmer can consume, we're going to choose that over 2 hours of fiddling with the internal serialization and customer/partner hand-holding to create a YAML service.
If Sun and Microsoft and IBM want to all come up with a native interpreter for these other formats and roll them into their flagship products, then great, I'll consider using them. Until then, stop hating on the one standard that actually makes my job easier.
Aaron G on May 12, 2008 06:48 AMFor scripting languages it's handy to have the config files
written in the language itself. For example here is python
config file for a program I wrote:
http://www.pixelbeat.org/programs/Tira-2/toppy.tira2
which can be parsed trivially with: config = eval(open(config_filename).read())
Pádraig Brady on May 12, 2008 06:48 AMI'm not a big fan of XML, but think it's OK in some scenarios. Unlike Jeff, though, I'm going to single out SOAP. We already have many perfectly good syntaxes for procedure call. SOAP is a product of the "insane complexity" one of the Google founders talked about. With a million simple, concise syntaxes for procedure call out there, why do we end up with this complex unreadable monster? How about "Currency GetLastTradePrice("DIS")"?
A. L. Flanagan on May 12, 2008 06:50 AM> But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information.
The problem is that *XML is NOT a way of representing information*. It's at best a way of building an information representation structure, XML doesn't represent anything.
> I would argue the value of having a standard far outweighs the inefficiencies in most cases.
XML is not a standard for anybody but marketroïd. One of Erik Naggum's numerous quotes about XML comes to mind here:
> Structure is nothing if it is all you've got. Skeletons spook people if they try to walk around on their own; I really wonder why XML does not.
> Take a simple example of a configuration file that some application will need for saving user information.
Wow, a non-sequitur already? The problem here is not "hey they're not using XML" but the reinvention of the wheel. There are, and were before XML, numerous formats that could be used for representing a conf file. XML is barely *an* answer here, and one that is usually misused to insert one more buzzword in a press release.
> I don't want to write little parsers anymore. I want to use something that is already written and works.
Guess what? There are numerous JSON and YAML parsers available for most popular languages. You don't have to write little parsers if you don't want to, and you haven't needed to since long before XML.
> XSD can be used to insure the integrity of the XML file *before* your program starts to slurp in the data in the file.
As I said above, there are schema languages for JSON. And I really don't understand why every person who talks about XML schema languages just *has* to pick the most verbose, unreadable and annoying one of the bunch.
> XSLT can be used to do arbitrary transformations on the data
So can any regular language, the only advantage this crippled, dumbed down, annoying language called XSLT has over others is that it's written in XML.
Wow, paint me impressed.
And yes, I have used XSLT, I've spent the better half of my days in it during a whole year. I know and understand the thing, and I still hate it, I'd take HaXml or HXT over it any day of the week if I was the one to choose.
> XML/XSL authoring and editing tools abound.
And mostly show how misguided XML is in the first place.
As for XML editors ... i'd like to know which ones are considered "good"?
I have tried several and either they are complex beasts of applications that try to satisfy every possible XML need you might have (Altova XMLSpy comes to mind), or they are very simple editors that let you edit the XML as tree and other forms but not much else (forgot the name).
The former simply have too much of a learning curve to be useful for all people working with XML in our company (and too expensive, too). The latter is simply not powerful enough or it's usability just feels "odd" enough not to encourage people to use it over plain text editors (with syntax highlighting).
I agree in part. There are plenty of situations where XML should never go, and some people use it in incredibly wrong and stupid ways but its not all bad.
Then again it seems software developers are like this, case in point: GOTO
Perfectly acceptable as long as it is done right, developers used it inappropriately and they demonized it as never being the right answer.
Matt Newman on May 12, 2008 06:56 AM> For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format
XML is not a format, it's a format representation, it has no meaning in and of itself and thus *nothing* was "standardized" for any value of "standardized" worth talking about.
Not to mention, long before the XML marketting blitz by the likes of IBM and Sun, there were ASN.1 or INI file, standards if there ever were any.
> I realize it's not the most ideal tool for your social shopping cart 2.0 AJAX app. You'd rather use REST.
Thanks for showing your incompetence and lack of comprehension of the topic, it's appreciated.
Just so you know, REST is orthogonal to the documentation representation used, you can use REST with JSON, with YAML, with plain text, with HTML (guess what, you do every time you access a web page) or with XML. Nice try, no sugar.
Masklinn on May 12, 2008 06:57 AMOoh look, I have a XML-parser with a read and write method. I can dump all sorts of objects in it, save them and retrieve them again. Hmm, ideal for config files. And high scores. UI definitions. Actually, ideal for pretty much everything I like to store which doesn't have to go in a database. Uhm yes, my ints come out as int and my lists come out as list, it's pretty amazing really.
Sure, if it's a plain textfile then I save it as plain text. And an image for example can sit neatly in an images directory. For everything else there's databases and XML.
Code comments as XML? That must be a joke and there are plenty of other jokes around. But in general: KISS and don't re-invent the wheel.
Caesar Tjalbo on May 12, 2008 06:58 AMXML has been around for so long and it's so pervasive we're probably stuck with it for a long time. A few developers using my language have created "easy XML" subroutines that do a lot of under-the-hood formatting and parsing. If we have to live with something we might as well make the best of it. Automate it and forget it.
PaulG. on May 12, 2008 07:02 AMI didn't go through all the comments but I didn't see DSL mentioned. Take a look at this http://www.ayende.com/Blog/archive/7268.aspx for example how to simplify configuration.
Reshef on May 12, 2008 07:03 AMOne thing XML gives you is an ability to randomly access data inside the file without loading it into a database. That can be handy for populating a catalog page in InDesign or building a web page on the fly.
But for something like a config file where you typically read the entire thing in at once it's a useless feature. And for batch-processing scenarios where the receiving system is always going to process all the data in sequence it's a useless feature with a performance penalty.
JPLemme on May 12, 2008 07:03 AMI like XML, honestly, for small things where you don't overuse attributes and all sorts of other junk.
Sort of like your simple examples:
<books>
<book>
<title>Coding Horror for Dummies</title>
</book>
</books>
But once you start to factor in XSL, XSD, XDSLXSLDX -- I just find that it all gets horribly bloated and against the ... well let's just say that I find using simply structured XML files easy and to a degree NICE to use -- but that XML quickly crosses a line from being 'enjoyable' to 'painful'.
N on May 12, 2008 07:05 AM+1 Aaron G.
If you are swimming in a sea of angle brackets perhaps you are doing something wrong. For most developers, especially those in SOA land, it's invisible under-the-hood plumbing that (mostly) Just Works(TM).
Damo on May 12, 2008 07:05 AM(Sorry, my example doesn't show because of the inclusion of the brackets...)
N on May 12, 2008 07:05 AMXML has its place, but lazy programmers use it for everything.
Its a new Windows registry or DLL manifest - something we never really needed, but makes complicated stuff easier (or possible for the more ignorant coder). However, as with all such RAD tools/standards bad programmers like to use it by default without thinking.
The .NET data controls output "horrible XML files" by default for instance... this is where I blame M$ and draw a parallel to the registry... but that would be unfair. As usual its the programmer's fault for choosing the wrong method to store/retrieve his/her data.
Its easier to not think than to think... and we are all bad programmers after all, so I can forgive it. :)
Jheriko on May 12, 2008 07:07 AMthis link is broken. :-(
3. Do you know what the XML alternatives are?
I've been digging into YAML recently and I must say it's a lot easier to pick up on, parse, and write than XML in my experience. It just seems more natural to say
Name: Shawn
Rather than
<name>Shawn</name>
Now if only we could get BizTalk to speak YAML. Sigh...
Kelly on May 12, 2008 07:11 AMI've been wondering about XML for a while. I only recently began to get serious about developing software, and XML was entering its halcyon days right when I started learning. For a long time, I trusted in the ostensible greater wisdom of the collective and assumed that XML really was what its ubiquity implied: The greatest thing since peanut-butter & Nutella sammiches. Recently, though, I really got to wondering about what the point was.
Clearly, XML is no fun to write by hand. The main argument I've heard regarding its verbose plain text format is "it's easy to debug", which makes me want to barf. This is what I'm really wondering: XML is meant to be a data transfer format. Take RSS, for example:
High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day. In situations like this--where every spare pound of fat on your data becomes inflated ten-thousandfold until, like the grotesque beast at the end of Akira, it is suffocating the entire known universe with its pustulent girth--shouldn't we be using a data format that's as thin and possible? Shouldn't the common symbols in a data file be encoded and compressed within the file itself? Which has a smaller bandwidth footprint? This:
<SomeDocument>
<SomeParagraph>XML sucks</SomeParagraph>
<SomeParagraph>no really</SomeParagraph>
<SomeDocument>
Or this:
&1=SomeDocument;&2=SomeParagraph;<&1><&2>XML Sucks>><&2>no really>>>>
The second one is pretty terrifying, but it would be TRIVIALLY EASY for ANY modern editor to translate it into something that doesn't rape your eyes (like YAML). Aren't we actually wasting TERABYTES of bandwidth every day by transferring human-parseable cruft in files that no human should ever see in the flesh anyway? Or am I missing something?
Max on May 12, 2008 07:12 AM"One thing XML gives you is an ability to randomly access data inside the file without loading it into a database."
Er, that's exactly what it doesn't do, hence terrible performance relative to binary, or simple textual data.
bobby on May 12, 2008 07:13 AMAllow me to express my utter indifference: meh!
I work with XML roughly daily as a developer, and it ain't no big thang. It's at least 12 parsecs farther along than the obsolete flat files we're unfortunately still dealing with.
Show somebody XML, even a total bonehead, and they'll figure it out in a few minutes. There's little magic to it, few assumptions made. Can it be abused and misused? Certainly, just like anything else in computer science. Is it largely redundant? Absolutely, but that can also serve to enhance readability in very large files.
Compare to what came before this: inscrutable binary files, INI files consisting only of key-value pairs, fixed-width flat files, delimited text files... Let's not forget our past, folks.
It's computer-readable, computer-writable, and it's more-or-less human-readable and human-writable, even if it makes you a little crosseyed. Which makes it way better than the tarpit we just crawled out from. JSON or YAML or whatever is probably on the horizon, but let's not say "XML sucks" when it was still a huge step forward.
John on May 12, 2008 07:17 AMOops... looks like your comment filter clobbered my examples. I forgot that it's never safe to assume "no HTML" means everything will be politely escaped rather than thrown in the trash. Here they are again, manually escaped like God intended:
This:
<SomeDocument>
<SomeParagraph>XML sucks</SomeParagraph>
<SomeParagraph>no really</SomeParagraph>
</SomeDocument>
Or this:
&1=SomeDocument;&2=SomeParagraph;<&1><&2>XML Sucks>><&2>no really>>>>
Max on May 12, 2008 07:18 AMI'm just thankful developers have turned to XML instead of undocumented binary files. We don't want to return to those years.
Bill on May 12, 2008 07:22 AMXML isn't bad for many things, but space efficient/easily read by normal computer users it is not. Before XML was used for config files, INI files were standard. They have limitations, but you can parse them VERY quickly, they serve the purpose (configuration) perfectly, and they are easily read and edited. I will never understand why XML took off for configuration. As for tabular data, the CSV standard was much better in my opinion. Once again, easy to parse, editable in many apps including excel, quick imports, and a small footprint. When it comes to more complex data, I believe XML is a good solution, but YAML/JSON is better in many cases for obvious reasons. The key is to use a standardized format that is supported by other major technologies. It really doesn't make a huge difference for most things. However, Microsoft added a binary format for datasets in .net for a reason. Sending huge XML files over webservices was slow, and adding a "tighter" format was a huge improvement.
Justin on May 12, 2008 07:25 AMI wonder if you may have also seen JDIL at jdil.org?
they say:
However, unlike XML, JSON provides no direct support for namespaces - and thus no standard way for avoiding name collisions when mixing data from diverse sources. Something like a namespace mechanism is required to lift JSON to the level of a data integration platform, as opposed to a data exchange format only. Also lacking are standard ways of naming objects so that they can be referenced from elsewhere, and for representing properties with multiple values.
If these concerns are addressed, JSON's reach will extend over more of the domain currently occupied by XML, while bettering XML in the cardinal virtue of simplicity.
I've become a big fan of JSON for 100% JavaScript web applications. You can convert XML from any online data source into JSON using Yahoo! Pipes. However, JSON is not very readable. It is a nasty mess of brackets. I have to use a JSON Viewer to figure out the object structure. Recently I used ASP.NET's JavaScriptSerializer.DeserializeObject to deserialize some JSON data into objects. This is totally undocumented and proved to be very difficult to figure out.
for added sillyness..
a program i wrote was written around a custom text parser.
this ended up being used within a large data analysis program, that needed to store settings etc.
I modified it to read a 'script' at startup, a script that could contain variables and other settings the program used.
perfectly human readable, since the plain text 'comments' were ignored, and it was childsplay to get the program to write the config file.
sledgehammer to crack a nut for most programs, but since this included the parser i figured why not
personally i don't care what the format is as long as its plain text of some sort, and thus easy to backup and copy.
essentially sod anything in the registry or some hidden binary file. i love the idea of the unix based 'dot' hidden config files. put them in the program directory (defaults) and the users directory for everything else.
btw what was wrong with .ini files? have a standard user dir & system dir for them and it works.. problem? never understood why they moved away from that
claire rand on May 12, 2008 07:26 AMHere here! Preach on, brother!
Dang I wish YAML had become the standard. Do I use it? Nope - because there are parsers for XML built into my language framework. Perhaps once a parser for .Net becomes established I may be able to convince our team to use YAML, but I seriously doubt it. The tyranny of XML will no doubt continue.
Referencing this article just might help, though.
"You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision. I hope one day our great-grandchildren can live in a world like that."
My dear sir, If you cannot, in some way or other, code it by hand, it's not a language worth using.
one of the big issues with HTML editors in the past has always been EXTREMELY redundant and sloppy code.
Regardless of the language, wizards, tools, and widgets are quick, but the codes should always be visible somewhere, somewhat understandable, and always editable.
Also, I wish they'd stop fucking with the standards. HTML has been around for ages, STILL gets much use (especially the oh-so-dreaded font tags CSS was to get rid of) The HTML/XHTML/XML bitchfest in fact, parallels the CSS fiasco a few years ago.
Introduce a new "language" based on what most people are "just fine" with, make it screwy enough that you have to reframe everything you already know and want to incorporate, and make it finicky enough that a number of people will revolt against it. What for? a little bit more flexibility and usability.
I won't be surprised if five years from now, people will still be refusing to use XML for things handled MUCH easier in other languages.
The Postindustrialist on May 12, 2008 07:30 AMI wanted to post some clear well defined rebuttals as I am a fan of XML and it's related technologies, but I can't really disagree with overall sentiment of the post.
SOAP is awful, and although less so, the XML used in the examples is too. Yes XML is often use like a club.
But some of these comments... come on! There's all sorts of guff popping up here from "XML is too hard" to the what's almost a carbon foot print argument!?
Ian on May 12, 2008 07:37 AMClearsilver
Same ideas but made in a more pragmatic view I belive...
http://www.clearsilver.net/docs/compare_w_xmlxslt.hdf
I like JSON because it's lighter-weight, and AJAX apps can easily "programmify" the server response by doing an eval. Of course there are a couple minor security risks with this, but they can be avoided.
The only reason I ever use XML is if I needed to pass data to/from different platforms (using SOAP). Good post!
Josh Stodola on May 12, 2008 07:42 AMSOAP is possibly the most horrifying example. Even if you set aside the whole document/message thing and the poor library intercompatibility, you are still using vast amounts of expensive-to-parse XML to model, in most cases, rather simple function calls. Benchmarks comparing SOAP to CORBA or Thrift (with binary protocol) or whatever tend to be almost comical, and one derives no real benefit from RPC being in XML, and yet SOAP is still heavily used as an RPC mechansim.
Robert Synnott on May 12, 2008 07:44 AMI do not agree with you at all on this matter. Although you try hard not to be anti against XML; but you sound much against it.
Even though it might not be all that easy to read for humans, but atleast it can be read. It was made in the era when programmers used to invent their own formats to write the application data in. Atleast we have a standard now. You can go on picking on it and soon we will have a stage when everybody is writing data in their own applications and their will be no interoperability.
Deepank on May 12, 2008 07:49 AM> Something like a namespace mechanism is required to lift JSON to the level of a data integration platform, as opposed to a data exchange format only.
But why would you do that? Why couldn't JSON just stay a simple format for data exchange and basic data storage? It's a tool, and it's an awesome tool for what it does. Use an other tool (e.g. YAML) if your task or data is more complex than what JSON can do.
Becoming "the hammer to nail them all" (including screws, puppies and ducks) is exactly what has gone wrong with XML, why would you want to repeat the same mistake?
Masklinn on May 12, 2008 07:55 AM> their will be no interoperability.
But there *is* none already! XML is not a data format, it's a data format representation, just because your config file is in XML and mine is also in XML doesn't mean they're interoperable in any way, shape or form. And that's why people have to build complete, custom, non-interoperable data formats on top of XML such as SOAP, XSL, XSD, DITA, ...
Masklinn on May 12, 2008 07:57 AMLISP and Scheme are excellent alternatives for XML.
leppie on May 12, 2008 07:59 AMWhat I find most bothersome about dealing with XML is that "parsing" it tends to happen on two levels. You use a parser to turn characters into XML elements, then you hand-roll another parser in your programming language to turn the "start-tag foo" tokens into actual data. (Or else I'm missing something huge.)
Whereas with JSON, you call json_decode($data, true), and whoomp! There it is.
After a lot of trying to make it automagic, I've also realized that XML is not 100% interchangeable with JSON, because XML has both attributes and text nodes. And I think that's where the other half of the pain comes from, for me: when JSON isn't enough, I use XML, but I don't have foo.innerText in most languages because the DOM insists on dealing with raw nodes. Grawr.
That's my hard-earned, unpopular opinion....
sapphirecat on May 12, 2008 08:01 AMBut it's just so damn ENTERPRISEY!!
Mattkins on May 12, 2008 08:03 AMSoap is insanity. XML in general is not so bad.
However, I agree that XML is overly complicated for representing flat data. Yeah, it's great that we have a standard. We can do better. Let's come up with one that make sense.
Jeff Davis on May 12, 2008 08:04 AMSorry, I don't agree. I'd rather have one syntax than 2,746 different "common standards", each with different bugs, each maintained by a small group of people, rather than an entire industry. XML has become what it has because people needed something to fill that role. And it turns out that it was flexible enough to handle tasks outside of its original design goals--the hallmark of any good system.
Is file size the biggest problem in computer science right now? Aren't there bigger battles to fight, or is everything else a smaller problem than this in your view?
I'm tired of people complaining about how expensive XML is to parse (as if you were writing the parser). Compared to what? JSON? Scale JSON to something that handles namespaces, includes, queries, encodings, and we'll talk. Give me one example where XML has been critically big or slow or complex.
Where's the complaint against HTML? Why don't you write this site in flash, and get rid of the RSS feed? You are personally adding strength to XML, you know. Let's see you put your money where your mouth is.
Brianary on May 12, 2008 08:05 AM> It's amazing. For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format, and then the "agile" groups had to mess it up with their JSON and YAML and whatever.
I think that's the point Jeff misses in his post. His SOAP example is completely self-describing. A zero-knowledge interception layer could evaluate that SOAP request and with absolute certainty (no heuristics) act on it. His email example would require heuristics that would occasionally be wrong.
XML is wrong for some things, of course. HTML is the right choice for web pages as humans are inherently heuristic and any errors in HTML are tolerated (by convention) much better than errors in XML tend to be tolerated. If a particular XML document schema were complex enough and used in enough disparate environments it would eventually become HTML-like in this respect.
But just because HTML-style heuristics are the right choice in the browser environment doesn't mean they're right in all situations. I think strict, unambiguous interpretation and format are key attributes to enable many inter-operation scenarios. I don't know about his YAML example, but one thing I take from it is that newlines and whitespace are important, which is a no-go since text documents often get their whitespace mangled as they are passed through systems. I don't think most people have a tolerance for casual data corruption.
David Gladfelter on May 12, 2008 08:06 AMFinally someone is saying what I felt all along. XML is a great document language, but a bloated data language. Not all data is a document. XML does great for documents, things like HTML, ODF, and so on. For configuration and other programatic data it just has way too much structure.
XML contains a lot more structure than most programming languages can represent natively, and so requires complex class libraries like DOM and SAX to access all that structure. Languages like Perl, PHP, Ruby, Javascript, Python, Objective-c, which let you create anonymous data types (or something close) organize everything into hashes (dictionaries) and lists (arrays). That JSON organizes everything into these two primitive types makes it, I think, the perfect data storage format. There is a 1:1 mapping between JSON and the internals for most languages, saving a lot of time and headache, and parsers and generators are easy to write.
Java is this big exception here, and I think Java has a lot to do with XML's popularity. With no good way to have anonymous data structures in Java, embedding data in your application is just not possible, you have to store it externally. The java folks were looking for a good flexible, expressive, easy for humans format for a while, and XML fit the bill. Java's collection classes, and many of its APIs, are already a bit unwieldy, so DOM and SAX didn't really seem too bad over there. Plus its DTDs, validators, etc, really fit in well in the bean counter environments where Java is often used.
So now those of us using languages that have more native support for complex dynamically structured data and a "just do it" attitude have to deal with something that was designed for a completely different sort of ecosystem.
JSON doesn't solve every problem, and it probably could use some good standard for something like DTDs, and something like xquery, but I've seen work done in this direction, and its all much simpler than the XML equivalent but can still represent anything, even a DOM tree.
Joseph Annino on May 12, 2008 08:09 AMBTW, here is how to use XML without exaggerating the cost:
<message
Date="2008-02-14T08:55:03Z"
From="The Whole World <us@world.org>"
To="Dawg <dawg158@aol.com>"
>Dear sir, you won the internet. http://is.gd/fh0</message>
This one is not so contrived as the ridiculous example in the post.
Brianary on May 12, 2008 08:13 AMOne of the good thing of XML now is all the libraries & tools that available to support it.
J on May 12, 2008 08:13 AMGreat. No HTML, but anything that looks like a tag is removed. Brilliant. What an excellent choice when hosing an XML discussion.
Brianary on May 12, 2008 08:16 AMXML is just a poor syntax for s-expressions. I'm disappointed that only one of you mentioned Lisp dialects, though Pádraig Brady got at the right idea when mentioning that Python has a read() function.
S-expressions are much easier to validate than XML (just think of all the possible ways that angle brackets can be broken) and easier to write by hand (a lot of text editors help you with the parens). There are also possibly hundreds of well-tested implementations of s-expression readers. The syntax is laughably trivial (it's either an atom, or a list of atoms). Besides, it's been done for literally fifty years! And you don't need to use a Lisp dialect to use S-expressions -- you could just as easily implement READ in some other language.
mfh on May 12, 2008 08:17 AMWhat I was trying to say (using square brackets):
[message
Date="2008-02-14T08:55:03Z"
From="The Whole World <us@world.org>"
To="Dawg <dawg158@aol.com>"
]Dear sir, you won the internet. http://is.gd/fh0[/message]
> Java is this big exception here, and I think Java has a lot to do with XML's popularity. With no good way to have anonymous data structures in Java, embedding data in your application is just not possible, you have to store it externally.
You're probably right here, as PJE explained in his "Python Is Not Java" (http://dirtsimple.org/2004/12/python-is-not-java.html),
> This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding". In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)
> If you are a Java programmer, do not trust your instincts regarding whether you should use XML as part of your core application in Python. If you're not implementing an existing XML standard for interoperability reasons, creating some kind of import/export format, or creating some kind of XML editor or processing tool, then Just Don't Do It. At all. Ever. Not even just this once. Don't even think about it. Drop that schema and put your hands in the air, now! If your application or platform will be used by Python developers, they will only thank you for not adding the burden of using XML to their workload.
I'd also really like to know why angle-brackets are so much worse than the newlines and indents required for YAML, and the quotes required for JSON, and how those languages provide data type decoration, schema validation, and declarative transformation.
Brianary on May 12, 2008 08:21 AMMany systems use XML simply for the sake of being buzzword-complete. If you're just transferring small amounts of information, using XML will indeed ruin your signal-to-noise ratio. But sometimes the structure of the document is a part of the message. If you need to exchange complex data structures with someone, you could certainly do worse than to send an xml schema and/or a sample document. The developers will quickly understand the general concept and the validator will make sure that the details are correct. Of course, in a perfect world everyone does unit tests and data validity will never be an issue..
Hirvox on May 12, 2008 08:24 AMThis is exactly like the discussions that arise when someone decides normalizing data is unnecessary, and have invented some new way to add array data types to SQL.
Brianary on May 12, 2008 08:25 AMI have a question for "The Postindustrialist":
You say it's not worth coding unless you can code it by hand. What about images? What about databases? You can't code JPEGs in a text editor. You can't code MySQL rows in a text editor. That's because they're optimized for their domain, which is what I contend most data should be. Joel Spolsky has a great blog post (http://www.joelonsoftware.com/articles/fog0000000319.html) where he goes into that. Search for "Quick question" at that URL to jump to the place I'm talking about. I don't see why we're taking such a performance hit in the name of being able to edit something in our favorite text editors. Who cares? Why is that important? Why is it okay for images and database rows to have their own optimized editors, but not other kinds of data?
Max on May 12, 2008 08:28 AMIf you'd have said that about ASCII 15 years ago I'd have said the same.
If for nothing else, XML gives you an encoding standard - We aren't all American.
If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
If the overhead is too much for your bandwidth I'm sorry.
It seems to solve more problems than it creates.
Oh, and it's human readable too (even if it does irritate some folks)
By the way, Brianary, you can display HTML in your comments by using &lt; instead of < and &gt; instead of >.
Max on May 12, 2008 08:29 AM@Max: That's HTML. It says "no HTML".
Brianary on May 12, 2008 08:31 AMIt's not that the *idea* of XML is bad. It's how it came to be implented. Like most creations by developers (e.g. C++, Assembly, Lisp, etc.), it is a human factors disaster.
NO consideration of presention interface to humans was given more than passing consideration. "Signal-to-noise ratio? Who cares? It's parsable, isn't it?" is the attitude that will continue to sink thousands of software ships on the reefs of confusing unreadability.
It may be parsable by the *machine* but it's *humans* who have to write and understand software. Making any code interface immediately obvious to the human mind is orders of magnitude more important than making it easy for the machine.
ThatGuyInTheBack on May 12, 2008 08:32 AM(No one reads this far down in the thread do they?)
When XML was first introduced I often said that it was a great way to expand your data by a factor of ten. Binary formats are often way more compact, except for text. Binary formats are also way more difficult to work with. All in all I'll take XML over the old binary formats just about every time.
The conversion from binary formats to XML was the big win.
Comparing XML to JSON, YAML, and so on is a useful exercise, but let's keep the larger historical perspective. We are already hugely better off than we were a few decades ago.
Rich on May 12, 2008 08:34 AMThanks for this post. I didn't know about YAML or JSON. Both standards are pretty interesting.
On first look, I would prefer JSON a little over YAML. But I don't think I like the name of JSON, because it's bound to a language. Please note: I don't say I dislike JavaScript or think JSON would usable only from JS, I just don't think it should be in the name of a standard that is obviously cross-language.
Hinek on May 12, 2008 08:34 AMI wholeheartedly agree with all you said. However, 1 point was not addressed: availability. I often choose inferior *ubiquitous* technology. Support, community, and 3rd party enhancement make life easier and development quicker going with the most common rather than going with the best. This gives me a greater range of choice.
My mp3 player is an iPod not because it's the best but because I wanted a wide range of options for durable cases and you can't find that with any other brand. I use Windows XP because I can find a free, and usually open source, version of a tool to perform any common task. These tools also are not the best of breed but I don't care. They get the job done and I go on with my life with more time and more money than if I demanded the best of everything.
I often choose to use XML for the same reason for many things even though it's a lousy choice (esp with data). However, all of my co-workers can look at my code and know what it is immediately. We can also use an all but infinite number of 3rd party enhancements later to modify the XML.
dinah on May 12, 2008 08:35 AMI've always disagreed with these types of arguments. Sure, there are cases where xml is abused. Packing csv into flat xml structures is usually bad, and I think we've all seen the case where someone serialized a binary by putting one byte per xml element in a huge xml file.
But that isn't the whole story. Your email example does it for me. If I use xml, I can use an xml parser that will always parse valid xml correctly, and I just deal with the data. If instead, I invent a spec for a format like, then I have to write my own parser. That looks incredibly simple, but you also have to remember all of the variations that an email header allows (splitting lines at certain columns, quoting names instead of leaving them unquoted, etc).
With XML, I just say el.InnerText and be done with it.
Similarly, that chess example looks like the xml would be hard, but if it had a schema, my editor would hide virtually all of that from me by giving me dropdowns for the elements that are completely context dependent. You could theoretically do the same thing with YAML, but the editors are not nearly as mature.
So in the end, by criticizing xml, you've really vindicated it. Good job.
Jess Sightler on May 12, 2008 08:40 AMAs long as every major language has libraries to allow reading/writing XML in both node-by-node and XPath modes, I'll take XML over any other format out there. Why do I want to spend my day writing custom parsers or trying to understand yet another file format? I'd rather save my mental energy for understanding the semantic content rather than the presentation. For simple key/value pairs or tabular data or pure text I don't know that I would choose XML, because parsing those formats is dead simple... But in most other cases, why not? Because a few examples can be found where XML was used badly? Hardly persuasive. That's like arguing that delimited flat files are bad because comma delimiters can be problematic when your data includes commas.
Michael on May 12, 2008 08:46 AMXML is the greatest thing to happen to data since databases. I believe that its benefits greatly outweigh its disadvantages. Show me any alternative which can be easily human readable, easily machine readable, whitespace-independent, allow comments, allow escape characters, represent serialized objects, platform-independent, represent web pages (XHTML), allow standard configuration, allow ordering of elements to be ignored if applicable, or not ignored, depending on the context, provide a simple set of basic rules, easily translatable into other formats (XSLT, CSS, etc.), and be accepted throughout the industry, and I'll gladly consider that. Until then, I'm perfectly happy with XML.
Joe Enos on May 12, 2008 08:53 AMI couldn't disagree more. XML, by virtue of ubiquity, is one of the simplest, fastest and safest data formats out there - give a random developer and xml file, and he'll have it parsed and usable in his program faster than with any other format out there.
CSV is OK, but it's too simple, and doesn't deal well with hierarchies and doesn't cope well with exceptions like commas, semicolons, tabs and newlines in the data (depending on the variant of CSV you're employing).
XML is discoverable - that SOAP example you just showed may be verbose, but it's legible with minimal knowledge, and it's easy to find similar examples online due to the wealth of keywords it contains.
By naming elements and attributes, queries over XML become far more legible. Data structures which support nameless lists or maps of data generally will have less legible code.
Sure, xml has flaws, but it's a lot more robust than JSON, and fairly simple - if overly verbose - to read. The ability to express data structures in a cross-program fashion is extremely useful, and xml (and some other formats) has been repurposed to achieve that goal.
There probably doesn't exist a serialization format which is both succinct, robust (in the sense that it's unlikely to mistake one structure for another - i.e. one xml data format for another), and simple - but XML is the best we have.
Take config files, for instance - xml might not be perfect, but each time I see the extreme diversity of syntax a simple linux distro has within /etc, I'm reminded that it's better to have a bad standard than no standard.
Eamon Nerbonn on May 12, 2008 08:55 AMAnyone who argues that "XML was never intended to be human readable" (which doesn't include the author of the article you link to; his thesis is rather different) has not made recent enough reference to the design goals of XML
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals
one of which is "XML documents should be human-legible and reasonably clear."
Obviously some implementations fail to meet this goal.
For the record, I accept a lot of your points, but the Churchill quote is still apt - basically we don't ever want to go back to a pre-XML world. People are constantly inventing techniques which are superior to XML for some things; that's good too.
Dominic Cronin on May 12, 2008 08:56 AMI just finished a small project here where we needed to communicate over HTTP between a data source (an internal web application) and a .Net client application.
SOAP seemed the easiest route on the web side: implemented in just a few minutes with Perl's SOAP::Transport::HTTP. Two API calls and a class later it was done.
On the client side? A nightmare. .Net assumed a full SOAP-server environment with a static API and a bunch of rigamarole I was unwilling to put up with. So, cleverly, I just thought I'd just use HttpWebRequest and parse the SOAP response. It's just XML, right? WRONG. Nothing is ever "just XML" in XML.
Namespaces piled onto document context piled onto all kinds of stuff to make using XPath-like queries nearly impossible. I finally wound up with crap like:
XmlNode xn = this.reply.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild;
Completely unnecessary, and could have been done a lot more simply with just delimited ASCII text I'm sure.
Lisp has had this problem solved 50 years ago, with S-Expressions. There is a reason the Lisp developers of the time refused the more "natural" M-Expressions that were planned for a future version of the language.
Alex Queiroz on May 12, 2008 08:59 AMWhat I hate about XML is the suite of "standards" that accompany it: XSD/DTD [for data integrity], XSLT [data transformation language], and XSL [xml stylesheets], XSL-FO, WDSL, and SOAP. It wasn't designed for the KISS-minded folks, but rather 'standards-body' type people who like complexity for the extra job security.
And XML was driver for creating a new industry of "Service Oriented Architecture" and made XML really fat and clunky. I have actually used XML for document authoring and it wasn't pretty since it used XSD/XSL/XSL-FO to generate an XML document to PDF!
XML in its raw form of simply tags and brackets, works well only for information that needs to be represented in a hierarchical, tree-like, nested format. That is XML's expressive power at its work.
But for most of the other information out there, such as passing information along, you can use a more flattened structured format that is more compact, easier to parse by algorithms (XML parsing is SLOW because of it's recursive, tree-like structure), and you don't need to depend on third-party tools to do it.
We are about to standardize all of our persisted text format configurations into YAML here.
Andy at Simutronics on May 12, 2008 09:10 AMPersonally I have switched to YAML.
That includes all my config on Linux too.
I use ruby to generate specific formats if need be (including XML).
I would never again in my life go back to writing XML with hand, and I refuse to stick to XML for _humans_.
Die XML, DIE!
They now say that XML was not meant for config files etc.. but it is a huge lie.
XML was meant to bloat the world and annoy users.
Like Captchas.
she on May 12, 2008 09:11 AMGreat post Jeff, I agree 100% whole heartedly!
Arnor Heidar on May 12, 2008 09:15 AM(memo
(date :year 2008 :month 02 :day 14)
(from :name "The Whole World" :email "us@world.com")
(to :name "Dawg" :email "dawg158@aol.com")
(message "Dear sir, you won the internet."))
I love the modern developer thought process. If you proclaim s-expression programming languages are confusing and too hard, you're a whiner and need to leave programming to the big dogs. If you proclaim that XML is confusing and too hard, you're an enlightened realist. Yes it is abused, but it works and the structure is relatively unambiguous, especially compared to the thousands of arbitrarily different pseudo-hierarchical INI/text formats on any UNIX machine. It's funny because you can see where they started out with a simple value pair, then suddenly oh crap, we need simple structures, tack it on. Except everyone does it differently. I don't like parsers, I don't like reading your crappy file format that's allegedly better than everyone else's. I hate your tab-delimited hierarchical hack of CSV and I hate your special version of INI. I hate Makefiles, they are ugly and the syntax is stupid. I hate working on parsers written by programmers that think they're too smart to use XML.
Yeah, SOAP is crap, but ANT is not that bad, it's an improvement over Make if you just suck it up and learn something different. XML is verbose and it's slow to parse and DTDs are stupid. But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
This is not directed to Jeff, but to some of the commenters. But seriously, the XML in that YAML comparison was practically transparent to me, I don't see what the big deal is. And I work with XML file formats that would make your hair fall out of that bothers you. And the email example, it may look good onscreen but I seriously do not want to ever have to maintain the black magic in Sendmail or try to write a functionally correct mailfile parser. The devil is in the details, which is exactly what is wrong with "simple" file formats.
Erik on May 12, 2008 09:21 AM> You say it's not worth coding unless you can code it by hand.
I don't think I've seen anybody say this. The problem is that XML was started as being "handcoded", and even today "XML-aware" tools mostly aren't much better than handcoding.
> What about images?
Depends on the images, SVG are easily hand-coded. But most images are fairly opaque binary formats with quality data-specific editors and deal with concepts that aren't and never have been hand-coded, for most.
> You can't code MySQL rows in a text editor.
Er... yes you can, it's called SQL. Or, when dealing with import/export, "excel", "CSV" or even "JSON" (the Django web framework has a data import/export to/from databases and its default serialization format is json)
> That's because they're optimized for their domain, which is what I contend most data should be.
And which XML is (document markup), except the vast majority of the uses of XML are far outside the domain it was initially created for.
> If for nothing else, XML gives you an encoding standard - We aren't all American.
Yes, and that's indeed important. I agree.
> If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
Actually, no it doesn't. It merely gives you half the parser (the structure), but it doesn't give you the syntactic parsing. In fact, JSON and YAML parsers give you much more on the parsing side as they also give you syntax (e.g. JSON has numbers, strings, arrays and maps/dicts/hashes and they're usually translated as such in your language of choice by the parsers)
> Like most creations by developers (e.g. C++, Assembly, Lisp, etc.), it is a human factors disaster.
Would you be kind enough as to explain what part of Lisp was a disaster? And why?
> Why do I want to spend my day writing custom parsers
Fail, both alternatives suggested here have parsers for pretty much every popular and less popular language out there.
Masklinn on May 12, 2008 09:30 AMThe thing about XML is that every language has a solid and well tested parser for it. Sure, it's not hard to parse other formats, but XML is nowadays built in anything, so it's a bit like a "reverse chicken-egg problem". That also means that for interop, you only need 1 parser in your application. If I would need one YAML Parser, one XML Parser, one .ini Parser in my Application, then I am creating redundancy in a certain way.
With current Disk Sizes, the extra space for storage should not be an issue (and if it is, there is surely a good way to store the xml file compressed), and for the Web, i'd rather see everyone turn on gz compression on their web servers first, before they switch to another format.
In short: XML is not the best tool, but it is one tool that works really well in most situations.
Michael on May 12, 2008 09:33 AMIf you are writing your own XML parser, then you are doing something very wrong.
XML was built with totally different design goals than JSON/YAML and unfortunately people have brutally misused XML as anyone that has used JavaEE pre Java 5 knows.
Gudmundur on May 12, 2008 09:35 AMXML may give you structure to a file, but everybody chooses a different implementation. I think that is Jeff's issue with it.
is it a:
<car>Mazda</car>
<vehicle>Mazda</vehicle>
Same DATA, different MARKUP.
In HTML, a <P> is a paragraph. The markup has meaning. In XML, the markup relates to the data it contains, anyone can define the markup as they see fit!
You could have 2 files with the same data in them, but represented differently in markup.
Some benefits:
With XSD (Schemas), you can validate it pretty easily, I think that is benefit. There are tools (XSD.EXE) to create .net class files from the schema although some of the classes it generates are not very readable, but it does give yo the magic xml.serialize and xml.deserialize with one line of code. Also, transformations from XML to HTML with XSLT, good for the web.
I don't think there is anything wrong with flat files or CSV files or even INI files. If you have a system that is transferred fix length flat files between applications, I don't see that much benefit in moving those files to XML formatted files.
Sure, there are parsers to parse the XML, but you still have to write code that uses the parser to get the DATA out of the XML.
It does lead to some of this nonsense code a posted by Clinton:
XmlNode xn = this.reply.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild;
If anything, I think XML is overused. Which is better INI file or XML configuration file?
Both can be parsed
Both are human readable
Parse time should be about the same if small size
Seem like the same to me.
Jon Raynor on May 12, 2008 09:35 AMI think that it is indeed true that XML was never meant to be read. It was meant to be easy to parse. And with parsing, more verbosity is ALWAYS better, because it allows for explicit clarification. Besides, anything that is not aesthetically pleasing doesn't have outputted by the parser.
anon on May 12, 2008 09:35 AM> Yeah, SOAP is crap, but ANT is not that bad
That's probably why even Ant's creator said that, in hindsight, using XML was not such a good idea (http://weblogs.java.net/blog/duncan/archive/2003/06/ant_dotnext.html)
> it's an improvement over Make
Pretty much everything is an improvement over make (and, really, Make isn't that bad until you have to go and use auto*).
> But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
Which have been advantageously replaced by annotations that are cleaner, clearer and simpler to read. And not separated from what they're talking about.
Masklinn on May 12, 2008 09:37 AMMarkup was removed on previous comment
markup : car data: Mazda \car
markup : vehicle data : Mazda \vehicle
But, hopefully you got the example.
YAML might be less verbose, however the parsers are picky enough that you can't write YAML by hand. The future is configuration as code.
Leon Brocard on May 12, 2008 09:40 AM@macklinn got it right. XML is the reductio ad absurdum proof that the Wisdom of the Masses is false. There is no There, There. It's just text. Parsers are only a *requirement* to extract the data, so that your Custom Built program can do its thing. As Fabian Pascal said:
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged — semantics — and once they do that, there is no need to repeat the tags in each and every record/document being transmitted. Any agreed-upon delimited format will do, and the criterion here is efficiency, on which XML fares rather poorly...
BuggyFunBunny on May 12, 2008 09:41 AMI have to agree - I'm no fan of XML, and never have been. I can appreciate that it's a clever way to express any kind of information, and separate it from any kind of presentation, but it's just hard to use.
Especially when you add the complexity of XML namespaces. Writing a concise XSD file that people will find useful is a nightmare, and that's if you have tools to help. What I want is a simpler version of SQL Server express, that doesn't need an install particularly. Text files might not be it at all. In fact, look at the way Outlook stores local data in pst files.
But hey, I have to say that a simple data design, when applied unsullied, is my preferred method of storing configuration. Not the default Settings providers, you understand, but plain old serialisation.
Neil Barnwell on May 12, 2008 09:42 AMFrom my experience with config/data files, the majority of the time processing them is figuring out how to parse them. Escape sequences are a pain, picking delimeters and agreeing on a format are time consuming operations. XML is a real win here because it gets rid of these problems and allows you to quickly get config/data files running.
The other big win is it's generally well understood at this point. Even for people who don't deeply understand XML it's pretty easy to understand at a glance how the data is structure. This may have more to do with the amount of people who have done at least minimal HTML rather than intuitiveness.
I completely agree this can be overkill at times but IMHO it's an acceptable default solution.
Jared Parsons on May 12, 2008 09:50 AMI think the heavy use of XML highlights a problem with programmers today. Namely, they don't understand or cannot apply basic principles from compiler design theory.
If people understood how easy it is to create a good parser, they would be more willing to make up and use other document formats. With XML, you can find a pre-built parser in just about every computer language. However, if you make up a file format that makes sense for your situation, then there is no pre-built parser for that. As in the memo file example from the Jeff's article, it would be difficult to find a pre-built parser for that format.
For people who don't understand compiler design theory, creating a parser is a monumental, error-prone task involving lots of deeply nested if statements. It is much easier to just use XML. For people who understand compiler design theory, creating a parser is as simple as breaking up the file into tokens, and then writing a grammar to read the tokens.
I agree with Jeff, but I can understand why everyone is using XML.
Jon Snyder on May 12, 2008 09:54 AMIt always amazes me that the same people who love XML are the ones that deride Lisp for all its parens.
Peter Eddy on May 12, 2008 10:03 AMSeems like a non-topic to me, there is no one "be all and end all format " that neatly meets the needs of all constraints that it may ever be adopted within, so this entire topic seems moot to me. Actually I'm not even interested in discussing why XML is so widely adopted like it is today, that's another pointless discussion.
What really interests me is why you feel that XML is used sometimes inappropriately. XML is always chosen for a reason. Even if it is simply an economic issue that is a valid reason. Even if it is simply because "programmer x only knows XML" (which I really doubt is ever a real world case) that is a valid reason to use XML. Although real-world is never this simple, there are always several reasons for any decision both explicitly and implicitly chosen.
Always design with all important constraints considered. If your design requires that people need to quickly and easily read the contents of remotely transmitted messages, then investigate commonly remotely transmitted messages. Don't re-invent the wheel. Find the solution with the fewest deficiencies possible from your constraints. A good architect doesn't just review one small aspect of a solution without looking at the solution's design from a macro-level and also without looking at the solution's compliance to its requirements. Sometimes perhaps XML implementations are simple idioms though and maybe in those cases things can be micro-reviewed.
Shan on May 12, 2008 10:09 AMI suppose if you don't like XML you could spend a decade inventing your own syntax, writing libraries for it in every language and inventing a whole host of auxillary tools and specs to work with your proprietary syntax. This would be an enormous, extraordinary waste of time but it might make you feel warm and fuzzy on the inside.
ocean on May 12, 2008 10:11 AMAn XML configuration file just seems like taking a simple INI file and making it as confusing as possible.
I just completed a small project that uses INI files. The application requires that the user manipulates the configuration files and INIs seemed like the easiest way to make that happen.
I used Todd Davis's code from CodeProject: http://www.codeproject.com/KB/files/VbNetClassIniFile.aspx
There is also a Open Source INI library on SourceForge called Nini: http://nini.sourceforge.net/
Stewart Schatz on May 12, 2008 10:21 AM"I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended."
That quote still has me scratching my head. I have been under the impression for the past 10 years that the entire purpose of XML was to define the structure of data used and generated by disparate systems so that those disparate systems can work together. If that's not the primary intention of XML, than what, exactly, is?
...or did I misunderstand the quote?
cranley on May 12, 2008 10:29 AMXML is celebrating its 10th birthday and you decide to spoil the party. :)
XML has never been a silver bullet - it's has some negatives like its many alternatives. The issue is with developers who adopt it inappropriately.
I happen to love XML (and XSLT), but feel I know where and when to use it.
I like pointy brackets too, but I also like curry for breakfast.
Shane Porter on May 12, 2008 10:38 AMXML has its place but I still use .ini files for storing settings. They are smaller than XML files and parsing them is easy. A lot of people see me writing apps which user ini files these days and are shocked I am not using XML. Sadly when I explain they cannot get around their mindset that XML files are the answer to all problems.
Also I am glad you mentioned YAML as it is awesome :)
Morgan on May 12, 2008 10:39 AMI completely agree with Jeff.
I'm not totally against the use of XML. But as Jeff pointed out, its use can be avoided in certain places like configuration files, quick databases etc. XML is very taxing when it needs to be used with the mobile devices where the data transfer is painfully slow and pricey.
For example, when implementing XML centric protocols like XMPP on mobiles, it takes more than 800-900 bytes to send a small text message across. Same is the case for web services. Average amount of data transferred while executing a remote method is anything between 2 to 3 kilobytes. That's really insane as the 70-80% of the payload is noise.
XML not only demands more bandwidth, but it's CPU (and in turn battery) demanding when parsing on low power devices.
Nikhil Belsare on May 12, 2008 10:40 AMWhat is with all this XML bashing? Maybe you guys are all form the old generation of programmers, who are use to using unstructured documents. XML has many advantages:
-data validation
-arbitrary data structures
-easy to manipulate
Good. Freaking. Lord.
An idiot with the best hammer available will still build a crappy house.
Get past the tools already.
Tom on May 12, 2008 10:47 AMI'd still rather store things in XML than the Windows Registry.
Maybe we can get back some day to using INI for storing program configurations.
I mostly just use XML for RSS feeds... might not be the intended usage, but it works pretty well. I always found parsing the things a pain though.
Kris on May 12, 2008 10:57 AMXML is a compromise between readability of information and efficient storage of data, and, like all compromises, can't be perfect on every aspect.
As you say, it's also the foundation for the industry-standard SOAP.
And, for readability, there are always alternative ways of visualizing XML, like viewers, etc.
I have no great love for XML. And there's definitely some horrible misuses of it, SOAP among them. But outside of the world of Unix it's largely replacing binary stores, and for that I'm happy.
I don't think XML is the best default choice, but I'd much rather have developers start with a text format and only move to a binary if really necessary. And if XML is the only text option they'll consider, bring on the XML. (Which text format is a bigger question: CSV is very useful when you're dealing with lots of similar-formed tabular data, YAML is great for a simple things, and binary is still the fastest and smallest. I haven't used JSON yet, so omitting that out of ignorance rather than merit.)
Tangent: The "Less efficient!" argument against all human-readable formats annoys me. Does it really matter in your application? If it does, go ahead and pick something else. But stop using that as the central point of any argument, as if it's always worth trading ease-of-use for efficiency.
Steven Fisher on May 12, 2008 11:24 AMAck. I edited my comment above before posting and it looks like I was saying binary is a text format alternative to binary. I just meant to say that text formats are great, pick the right one, and binary formats still have their place.
Steven Fisher on May 12, 2008 11:27 AMYeah, I agree. Next time I'm just going to save my data in compressed binary or deep into a database or force it into flat property files like almost everyone did before XML. This XML stuff sucks.
I don't understand why anyone would actually deal with raw XML, it's not really a human-readable format...
Not that I'd wait for a generic XML editor that does everything for everyone--but if you are a programmer supplying a system and you ever tell users to bring up their editor and edit the XML file, you're doing it wrong.
Bill on May 12, 2008 11:27 AM@Nikhil Belsare: That a great rationale for only using gopher on handhelds!
Brianary on May 12, 2008 11:29 AMCan't say I disagree with this post more.
The reason XML is so great for everything is that it is ubiquitous. There’s something to be said for not reinventing the wheel with every application. There’s something to be said for using built-in .NET or Java libraries (Dude, have you seen what cake it now is with Linq2Xml?). Sure it’s inefficient; sure it’s verbose and ugly, but I know how to do it with a small amount of code and with few errors. Why would I now learn the quirks of YAML or JSON? I’ll let you spend the time learning and struggling with the optimal platform to create websites with; I’ll spend the time making a cool website that is slightly inefficient.
I actually use JSON about 95% of the time and XML only when I absolutely must, usually because of backwards compatibility.
I certainly use it if I'm communicating between client and server in an AJAX situation. Not only is it less data to transmit, but it takes less energy to parse because it can be interpreted as Javascript directly, modulo security concerns.
That said, it doesn't really matter that much in the end. It's about what you can build not how you build it.
Jesse Farmer on May 12, 2008 11:34 AMI'm sooo sick of these "we hate XML cos that's wot all the clever dicks on slashdot say" rants. And I've got news for all you Ruby / YAML folk too:
1) YAML sucks. It's really, really poor. The only reason it exists is to attempt to replace XML, which kind of makes it a non-thing in it's own right.
2) Ruby is a slow as a dog, and ain't getting faster any time soon. ROR has reached both it's bullshit and performance threshold.
You people are so blind to what XML has achieved, and Dereck Denny-Brown displays this perfectly...
"...In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today..."
Erm... have you ever heard of the INTERNET which uses this stuff called HTML which is, well, to all intents and purposes.... XML?!
If you don't like the markup/data ratio in your XML documents USE SMALLER ELEMENT NAMES!!! How hard does that sound?!
All this bleating about padding out a few text files is completely stupid - modern hard drives are enormous compared to the size of these files! What makes this matter even more amusing, is that it's the Ruby crowd trotting out this argument more often than not. Ruby is so incredibly slow that I can't help but deride any Ruby fanatic moaning about "bloat" in XML.
And finally, the biggest, dumbest buzz-word 2.0 compliant whine of all - YOU SHOULD ALL USE JSON COS IT'S GREAT! What?! You want me to build a distributed application that communicates over a network via. strings of code which get evaled on the client?! Does this sound like a security disaster waiting to happen, or what?!
Robin on May 12, 2008 11:36 AMOne interesting thing about XML happend in the Java\Java EE side. Back in 2000-2005, almost every framework and spec started using XML for configuration. The main reasons were: 1 - it´s human readeable; 2 - it´s tooling friendly; 3 - it´s portable.
Unfortunely, the (ab)use of XML made Java\Java EE development complex.
Fortunely, two things happened that saved the day:
1 - Rails come in with it´s convention over configuration, dynamic language and other stuff, and it shaked the very foundation of the Java\Java EE development.
2 - Annotations where added to the languages.
Nowdays, almoust every new framework and spec rely on Annotations+Convetion over Configurations.
Rubem Azenha on May 12, 2008 11:48 AMI have been saying this for a long time. I always thought why couldn't we just use the java properties file or something less verbose and error prone.
Bobby on May 12, 2008 11:50 AMLua programmers laugh at this entire discussion.
poppafuze on May 12, 2008 11:56 AMI agree with the pro-Lisp commenters. There are some really good Common Lisp packages for generating XML. xml-emitter for example. Needless to say (to anyone who knows Lisp at least) it is a million times clearer and easier to revise data in Lisp's syntax than it is in XML.h
Robert Levy on May 12, 2008 12:00 PM.NET programmers laugh at you Lua programmers, and wish you the best of luck in your job search.
Anonymous on May 12, 2008 12:01 PMAll true, but then John's comment (expressing utter indifference) is correct too. The badness of XML is more of an insult than an actual problem, or, to put it another way, if XML is your biggest problem then you're doing well.
My theory is that XML's insulting badness is the reason for its success; since it's practically impossible to write a correct XML parser from scratch, everybody uses existing libraries. Result: actual working standardization.
Andrew McG on May 12, 2008 12:04 PMMax said:
“High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day… shouldn't we be using a data format that's as thin and possible? Shouldn't the common symbols in a data file be encoded and compressed within the file itself?”
I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?
dwardu on May 12, 2008 12:05 PMI am in total agreement with Robin and those others defending XML. This is the most ridicules argument that has ever come about in the IT industry – and it’s just painfully tiring – and it really separates those who have had their eyes open during the evolution of the technology and those that, well, slept through it or simply weren’t there. XML is as simple as you want it to be and as complex as you may need it to be – don’t confuse or punish XML because of how people have applied it, E.g. SOAP. That would be akin to reading Mein Kampf (in German) and concluding that German is a horrible and evil language.
I’m not going to reiterate all of the very strong points made pertaining to the benefits of XML above - XSD, XSL(T), DOM, Namespacing etc. I do have a couple additional points to make however.
If, as an example, JSON or YAML or whatever, were to take hold as the industry standard today, in 10 years, I can guarantee you that we’ll see the same silly arguments against it because it will have matured to a full blown multipurpose markup language for data packaging with all the potential complexities found in XML – why? - Because it needs them for the times they are needed! And do we really want to start over again and go through the evolutionary pains with another general-purpose specification strategy – I don’t unless you can show me some *very* strong arguments.
And for those who say XML is not human-readable - hmmm... that truly scares me and I’m not sure I’d want those people writing code on any of my projects. There are things in our industry far more complicated than a few angle-brackets. Latin is not human-readable if you don’t know Latin – so learn it or get out of Rome.
I want a single standardized syntax to represent data – period. And I want to use that same syntax to describe it and that same syntax to transform it.
Audaxis
"code comments"
Brrr, i realy can't understand why Visual Studio is using these horrible xml-comments.
Sure, they are automaticly generated by typing /// but after that they waste 2 lines of code for some <summmary>-crap.
Javadocs @-thingys are alot better even if i cant remember the exakt syntax atm.
crazy ivan on May 12, 2008 12:18 PMThe real complaint here is that your favorite platform doesn't ship with a lightweight XML interface for humans. Stop using notepad to view large XML files. You wouldn't use a hex editor to read your plaintext, would you?
calcnerd256 on May 12, 2008 12:23 PMThank you Jeff!
I work in the financial services industry. Recently, I had to deal with one of *THE* biggest vendors in the industry, providing mutual fund data. They FTP us fund holdings and other information in the form of XML files - each file is about 2 GB - that's right, 2 GB of XML per file, per day! Drove me crazy trying to understand how supposedly professional programmers from this firm could come up with such a monstrosity.
I hope somebody from that firm is reading this blog.
But then again, if they were into reading such articles, they wouldn't have designed such a system in the first place, would they?
Jeff,
I'm the managing editor of xml.com, so its perhaps not surprising that I may have a different viewpoint on XML. However, I think there are a few things to keep in mind with your post.
1) XML started in the document arena, and is in general at its best in the document arena. Given that there are a large number of semi-structured documents out there that can be rendered via XSLT into dozens of potential output formats when expressed in XML, can be searched via XQuery and can otherwise be manipulated (and validated), its worth understanding that even many XML gurus have repeatedly raised questions about whether XML should be so heavily used in the data domain.
2) SOAP was created initially as a way of getting around the restriction of passing binary content into and out of port 80, and it's structure is considered ugly and ungainly even by many XML advocates. SOAP's also not used in an XML pipeline - it's there primarily as a temporary format for serialization from and marshalling to binary objects for invocation of RPCs, and the fact that XML was used in its expression was unfortunate, at best.
Note that a similar argument can be made for RDF triples, which can be mind-numbing when expressed in XML format. Significantly, RDF can be thought of as hypernormalization of data and why XML was used to encode it is something that has mystified a lot of people - significantly, most contemporary RDF applications prefer to use Turtle notation.
3) XML as user interface - this is a little more complex, and can be argued on both sides. XML is actually pretty good at modeling UI - there's a very nice separation of concerns that when done properly works well in separating structure from presentation. The problem that I see there may actually have more to do with the presence of JavaScript (or other scripting languages) that attempt to work within a given UI DOM to make actions work (thus building rather hellacious interdependencies) rather than being defined to provide functional behaviors to XML-declared components (such as via XBL bindings).
This is more a question of user education than lack of suitability of the tools. I saw this same mindset at work with Visual Basic back in the 1990s, where, because building and designing components was harder, people would create very fragile applications because those were the easiest to write.
I'll have more to say on this point in my own blog post shortly.
Kurt Cagle on May 12, 2008 12:35 PM@dwardu:
"I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?"
To a certain extent, it can be. Apache's mod_deflate (http://httpd.apache.org/docs/2.0/mod/mod_deflate.html) will compress content before sending it to the client, but that's as good as you can get. XML feeds are going to get served to RSS clients via HTTP, so you can't use a compression technique that the client isn't prepared to accept. That means it's mod_deflate or nothing (AFAIK), and I still feel like it's a band-aid solution.
Max on May 12, 2008 12:41 PMI have to deal with data files, that are basically just flat data (think of a simple "select * from table"). It bothers me every time a customer sends us an XML file... CSV is perfect for that thing.
N on May 12, 2008 01:06 PM> Erm... have you ever heard of the INTERNET which uses this stuff called HTML which is, well, to all intents and purposes.... XML?!
Fuck no it's not. Had the web been xml, with all it entails, it would never have taken off.
Oh, and some people tried to XMLify the web, with XHTML1.0, XHTML1.1 and a tentative XHTML2 spec.
Last time I checked, they failed epically and the bleeding edge moved to an actually feasible revision of HTML instead.
Masklinn on May 12, 2008 01:25 P