Everywhere I look, programmers and programming tools seem to have standardized on XML. Configuration files, build scripts, local data storage, code comments, project files, you name it -- if it's stored in a text file and needs to be retrieved and parsed, it's probably XML. I realize that we have to use something to represent reasonably human readable data stored in a text file, but XML sometimes feels an awful lot like using an enormous sledgehammer to drive common household nails.
I'm deeply ambivalent about XML. I'm reminded of this Winston Churchill quote:
It has been said that democracy is the worst form of government except all the others that have been tried.
XML is like democracy. Sometimes it even works. On the other hand, it also means we end up with stuff like this:
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<m:GetLastTradePrice xmlns:m="Some-URI">
<symbol>DIS</symbol>
</m:GetLastTradePrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
How much actual information is communicated here? Precious little, and it's buried in an astounding amount of noise. I don't mean to pick on SOAP. This blanket criticism applies to XML, in whatever form it appears. I spend a disproportionate amount of my time wading through an endless sea of angle brackets and verbose tags desperately searching for the vaguest hint of actual information. It feels wrong.
You could argue, like Derek Denny-Brown, that XML has been misappropriated and misapplied.
I find it so interesting that XML has become so popular for such things as SOAP. XML was not designed with the SOAP scenarios in mind. Other examples of popular scenarios which deviate XML's original goals are configuration files, quick-n-dirty databases, and [RSS]. I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended. In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today.
Given its prevalence, you might decide that XML is technologically terrible, but you have to use it anyway. It sure feels like, for any given representation of data in XML, there was a better, simpler choice out there somewhere. But it wasn't pursued, because, well, XML can represent anything. Right?
Consider the following XML fragment:
<memo date="2008-02-14"> <from> <name>The Whole World</name><email>us@world.org</email> </from> <to> <name>Dawg</name><email>dawg158@aol.com</email> </to> <message> Dear sir, you won the internet. http://is.gd/fh0 </message> </memo>
Because XML purports to represent everything, it ends up representing nothing particularly well.
Wouldn't this information be easier to read and understand -- and only nominally harder to parse -- when expressed in its native format?
Date: Thu, 14 Feb 2008 16:55:03 +0800 (PST) From: The Whole World <us@world.org> To: Dawg <dawg158@aol.com> Dear sir, you won the internet. http://is.gd/fh0
You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision. I hope one day our great-grandchildren can live in a world like that. Until that glorious day arrives, I'd sure enjoy reading text files that don't make me suffer through the XML angle bracket tax.
So what, then, are the alternatives to XML? One popular choice is YAML. I could explain it, but it's easier to show you. Which, I think, is entirely the point.
<club>
<players>
<player id="kramnik"
name="Vladimir Kramnik"
rating="2700"
status="GM" />
<player id="fritz"
name="Deep Fritz"
rating="2700"
status="Computer" />
<player id="mertz"
name="David Mertz"
rating="1400"
status="Amateur" />
</players>
<matches>
<match>
<Date>2002-10-04</Date>
<White refid="fritz" />
<Black refid="kramnik" />
<Result>Draw</Result>
</match>
<match>
<Date>2002-10-06</Date>
<White refid="kramnik" />
<Black refid="fritz" />
<Result>White</Result>
</match>
</matches>
</club>
|
players:
Vladimir Kramnik: &kramnik
rating: 2700
status: GM
Deep Fritz: &fritz
rating: 2700
status: Computer
David Mertz: &mertz
rating: 1400
status: Amateur
matches:
-
Date: 2002-10-04
White: *fritz
Black: *kramnik
Result: Draw
-
Date: 2002-10-06
White: *kramnik
Black: *fritz
Result: White
|
There's also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.
You could do worse than XML. It's a reasonable choice, and if you're going to use XML, then at least learn to use it correctly. But consider:
I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to the XML angle bracket tax. <CleverEndQuote>Again.</CleverEndQuote>
Of course, YAML also supports graphs (in contrast to XML, which can only encode trees).
Tom on May 12, 2008 5:47 AMI don't think I could agree with you more on this matter. XML has been abused in an almost-criminal manner.
XML has, to be honest, bugged me from day one. It just isn't at all readable and, despite claims of storage-invisibility, there are just those times when you have to look at the data - XML makes this a truly painful task.
I frequently find myself looking at XML files containing vast amounts of data - whether I'm programming something to parse them effectively, or just trying to work out a data format - and it is forever causing me headaches.
I think the problem stems from peoples' tendency to universally apply something that works well in a particular situation. Take databasing, for example. In my line of work, I so often find situations where people have decided that databasing is so cool, everything should be placed into a database - be it static images, links, whatever. You can have too much of a good thing - and this is just another classic example of the problem.
XML should be used sparingly, and in situations where it actually improves readability, structure and clarity of data. If you're looking for something more complex, you need another way of storing your data: be it YAML, be it JSON, be it another database.
XML is not the be-all-and-end-all, and I think its about time the "average developer" realised this.
James on May 12, 2008 5:53 AMHere, here! I love it when you tell it like it is. Seems to me that far too many people reach for the 'silver bullet' that is XML, then end up with a big pile of mess. The trend for storing data that really should be in an RDB worries me particularly.
"You might argue that XML was never intended to be human readable,"
In fact, I'd argue the opposite! Let's not forget that XML is a *Markup Language* and is best when marking up a document, not when storing 'data', in its strictest sense. Give me a piece of well-marked up HTML, and it's a breeze to read. Give me textual data as, well, text, please!
bobby on May 12, 2008 5:54 AMOh, I forgot to include the canonical HORRIBLE example: ant. Give me a nice readable Makefile any day.
bobby on May 12, 2008 5:55 AMWhy care about the format of your data when we live int he age of inline-able converters?
xml2yaml: http://linux.die.net/man/1/xml2yaml
At least we're not stuck with ASN.1.
Jim McCusker on May 12, 2008 5:57 AMI am building a ruby project that involves AWS. I chose to use JSON for SQS instead of XML because it's lighter and instead of YAML because it's faster (at least from my ruby benchmarks).
You now have libraries for JSON and YAML in most programming languages. Also JSON is nearly YAML-correct (see http://en.wikipedia.org/wiki/YAML#JSON).
Piku on May 12, 2008 5:59 AMWhat I like about XML is that even if somebody uses it badly, at least it's some kind of standard that you can pick your way through. No matter how much of a mess it is.
robustyoungsoul on May 12, 2008 6:02 AMXML is like violence: if it doesn't solve your problem, you're not using enough of it. ;-)
Mo on May 12, 2008 6:02 AMAnt makes me cry.
Andrew on May 12, 2008 6:05 AMI've found that for a rather large amount of what people want to use XML for (what the quoted person called "data scenarios"), you are far better off using CSV. Its much easier to parse, and can be edited and manipulated in the database or spreadsheet program of the user's choice.
There may be a burgeoning market for XML tools, but they have a long, long way to go before they come close to the support available for dealing with CSV files.
T.E.D. on May 12, 2008 6:09 AMHear hear!!
Dave on May 12, 2008 6:09 AMXML and all the tools around it, especially XSL, can take a flying leap. As Wikipedia points out, “the syntax of XSL language itself is valid XML.” As if we were not bloodied enough subjected to XML, now we start cutting off our limbs by using XSL – reminds me of the Black Knight in Spamalot.
Somewhere around Smalltalk seemed to me to be the pinnacle of computing science in its simplicity and elegance. The entire Smalltalk syntax could be represented on a postcard. http://www.esug.org/whyusesmalltalktoteachoop/smalltalksyntaxonapostcard/
What has happened to our industry?
So much of XML runs counter to well understood programming paradigms, too:
http://myarch.com/why-xml-is-bad-for-humans
Jeff Atwood on May 12, 2008 6:17 AMI think the one thing that's missing from that argument, however, is that XML is much easier to validate. If you didn't have xsd, I'd agree with you, but without it there's no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it. In code (as far as I can tell).
And then if your validation criteria changes, you're back to unpicking your predecessor's hokey undocumented parser and validator and then trying to spot-weld in your extra logic. And then recompile. And then (depending on what kind of change-controlled environment you work in) jump through the hoops to get it deployed.
I could well be wrong, though.
Benjimawoo on May 12, 2008 6:26 AM> XML is like violence: if it doesn't solve your problem, you're not using enough of it. ;-)
XML is also, just like violence, something the world could do without ;)
James on May 12, 2008 6:26 AMWell said.
I think that XML isn't really appropriate for many of the applications for which it is being applied. One of the problems is that XML is flexible enough to be turned into about anything, whether that makes any sense or not. There are many good uses for XML out there, but I'm afraid that the many poor uses will prejudice people against it.
Matthew Reed on May 12, 2008 6:29 AMwe are designing an entire reporting system around XML control files. One engine to "rule them all" and small XML files containing everyting we need in the report. So far, it has been a nice solution and we wrote a generator to create the XML. No more digging in code looking for where to change the header width or column order... just load the XML into the generator, make your change, and BAM! instant report update.
Of course it isn't released or even in alpha yet, but "Works on My Machine"
Wayne on May 12, 2008 6:32 AM> There's also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.
There's also another cool thing: JSON is mostly a subset of YAML (there are a few small differences, see http://redhanded.hobix.com/inspect/jsonCloserToYamlButNoCigarThanksAlotWhitespace.html, but it's overall compatible). This means that it's fairly easy to start with JSON and jump to YAML if the structure is too complicated for JSON.
> At least we're not stuck with ASN.1.
ASN.1 is ok, as long as you don't have to create or parse it by hand. But then again, ASN.1 is not supposed to be hand-parsed. And you'll note that XML is the same, it's just that XML is (supposedly) human-readable, and every language has XML serializers and deserializers while Erlang is one of the few languages with an ASN.1 encoder/decoder smack right in its stdlib.
Masklinn on May 12, 2008 6:33 AMXML became the default because of its flexibility in data formatting. And, because it has become so ubiquitous, almost all programming languages have built in ways of easily parsing XML. In fact, I do almost all of my web output using XML and then use XSL style sheets to transform it into HTML. I remember some blogger, can't remember his name, blathering on about MVC and how you should make your output "skinable". Well, if you produce XML output, your webpages are extremely skinable.
The problem is that XML maybe very computer friendly, but is not too human friendly. Most people will easily agree with that. However, there are dozens of GUI oriented XML editors that make reading and writing XML much easier. I've even written a 10 or so line Perl script that converts files from YAML to XML and back. (Yes, Perl. What do you expect from someone who uses VI as their main program editor).
XML is not really the problem. It is an excellent and extremely flexible data format. The problem is our attempt to read and write directly from XML when there are many excellent tools that can help us with the task. After all, you don't expect to read and write Microsoft Word documents using a standard text editor. Why should XML be all that different?
I'm not a fan of using YAML as a data formatting tool because it doesn't go far enough to solving the problem. YAML becomes unreadable when your data becomes more complex and there are very few development tools that can parse YAML files. It's silly to come up with another inferior data format to XML which doesn't really tackle the main issue of human data readability when there are few programming tools to read and write it. You're better off using one of the wide variety of GUI XML editors that can make your task much easier.
After all, how many developers use IDEs to help them program even though almost all programming code is in text and could be done (in theory) using Notepad?
David W. on May 12, 2008 6:34 AMI'm afraid I couldn't disagree more. No, XML isn't the easiest to read (by humans) of all the infinite number of alternatives out there. No, XML isn't the most efficient in terms of space. And yes, perhaps it has been forced into places it was never intended to go. But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information. I would argue the value of having a standard far outweighs the inefficiencies in most cases.
Take a simple example of a configuration file that some application will need for saving user information. We've all been there, making up an ad hoc scheme for saving whatever needs to be saved. Then building a little parser to read and write the data in that form. And over time our little config file grows and changes. Someday, a new programmer joins the team and has to deal with this file. What are the construction rules again? Where can a new item be added that won't break the little parser? How much time has been expended over the life of the application in building, modifying, and fixing that bit of parser code as things needed to change?
There are numerous XML parsers available that are robust and free. They all work pretty much the same way (with a few exceptions that I'd call bugs in implementation). I don't want to write little parsers anymore. I want to use something that is already written and works.
The same argument can be made with respect to the other tools that are widely available to deal with XML-encoded data:
-- XSD can be used to insure the integrity of the XML file *before* your program starts to slurp in the data in the file. This can be critical in B2B situations like banking or ordering from a supplier.
-- XSLT can be used to do arbitrary transformations on the data (in a standard way) to produce files of any format that is convenient on the data consumer's end of the exchange. I do a lot of this sort of transformation work--none of it for web pages--and I can vouch for the power and convenience of having a standard transformation language.
-- XML/XSL authoring and editing tools abound. There are tools that will produce an editable visual representation of a schema (a real boon if you need to capture complex data in a text file). Most of these tools will do much of the work of editing XML files and will help you to construct correct XML with prompts and intellisense-like prompts.
I'm a big fan of XML. No it certainly isn't the very best that we could do but it is a quantum leap better than what we had before--custom representations for everything. If there is one single improvement we could make to advance the art of programming today, I'd vote that it’s STANDARDS. We don't have to wait for perfect standards to emerge (they won't) but we do have to get to the point where we can agree. XML is a step in the right direction.
Thankfully, someone will be implementing YAML into Boost.Serialization this summer.
http://code.google.com/soc/2008/boost/appinfo.html?csaid=BE3EEB904A90B03A
> but without it there's no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it
Actually,
1. Even in XML there are other (far better, especially on the readability front) schema languages/systems than XSD (RelaxNG, Schematron)
2. Schema languages/specs are starting to appear for e.g. JSON (Cerny, json-schema)
3. JSON documents are very often orders of magnitude simpler than their XML counterparts, thus validation becomes almost trivial and often doesn't require a full-blown schema language.
4. Manually parsing and validating a JSON document isn't really hard with a dynamic language.
Thank you for addressing some of my concerns regarding this sacred cow!
"it is rapidly becoming a standard way of representing information"
It is hardly more a 'standard way' of representing information than ASCII (or UTF8, UTF16, etc.) Yes, anyone can write a file with lots of angle brackets, and parsers can easily turn that back into tokens, but the semantics of the file remain application-dependant in almost every example of (bad) XML usage I've ever seen.
"Take a simple example of a configuration file that some application will need for saving user information. We've all been there, making up an ad hoc scheme for saving whatever needs to be saved."
Er, YOU might have been, but the rest of us are familiar with a small number of pretty common configuration formats that are trivial (i.e. easier than XML) to parse.
"XSD can be used to insure the integrity of the XML file"
Yes, for a very limited meaning of the word "integrity".
"XML/XSL authoring and editing tools abound"
And text editors are 'abounder'.
"We don't have to wait for perfect standards to emerge (they won't) but we do have to get to the point where we can agree. XML is a step in the right direction."
OK, if we can get the billion different languages floating around reduced to maybe less than a hundred or so, I agree with you :)
bobby on May 12, 2008 6:48 AMI've also been critical of XML ever since i had to start working with it. I'm coming from Lua where a configuration file is simply a Lua script. If you got an error in the script, you'll get an error message from the Lua interpreter.
Now, if you have the same configuration file in XML format, and NO validation as it is usually the case, you can get a list of problems reading this in your code:
- program crashes
- program says: "error reading config file"
- program starts but uses default settings for all configurable features
- program starts but uses default settings for a subset/single feature
- something else entirely ...
Yes, this is only in the narrow "configuration file" scenario but that's just one where i think XML is totally overused and/or under-validated.
Btw, what ever happened to INI files? ;)
steffenj on May 12, 2008 6:48 AMIt's amazing. For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format, and then the "agile" groups had to mess it up with their JSON and YAML and whatever.
Who cares if XML is bloated? It's not like you're writing this by hand. With a lot of newer technologies like ASP.NET Web Services and Linq to XML, it's practically invisible to programmers. It's just what's going over the wire, that's all. It's also very easy to compress and encrypt, which, if you really care about the size, makes it about as compact as anything else out there.
I realize it's not the most ideal tool for your social shopping cart 2.0 AJAX app. You'd rather use REST. Fine, you can design your own custom web services for that. But most of us in the real world have jobs to do and precious little time to do them in, and if it takes 5 minutes to write an XML Web Service that every other .NET or Java programmer can consume, we're going to choose that over 2 hours of fiddling with the internal serialization and customer/partner hand-holding to create a YAML service.
If Sun and Microsoft and IBM want to all come up with a native interpreter for these other formats and roll them into their flagship products, then great, I'll consider using them. Until then, stop hating on the one standard that actually makes my job easier.
Aaron G on May 12, 2008 6:48 AMFor scripting languages it's handy to have the config files
written in the language itself. For example here is python
config file for a program I wrote:
http://www.pixelbeat.org/programs/Tira-2/toppy.tira2
which can be parsed trivially with: config = eval(open(config_filename).read())
Pádraig Brady on May 12, 2008 6:48 AMI'm not a big fan of XML, but think it's OK in some scenarios. Unlike Jeff, though, I'm going to single out SOAP. We already have many perfectly good syntaxes for procedure call. SOAP is a product of the "insane complexity" one of the Google founders talked about. With a million simple, concise syntaxes for procedure call out there, why do we end up with this complex unreadable monster? How about "Currency GetLastTradePrice("DIS")"?
A. L. Flanagan on May 12, 2008 6:50 AM> But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information.
The problem is that *XML is NOT a way of representing information*. It's at best a way of building an information representation structure, XML doesn't represent anything.
> I would argue the value of having a standard far outweighs the inefficiencies in most cases.
XML is not a standard for anybody but marketroďd. One of Erik Naggum's numerous quotes about XML comes to mind here:
> Structure is nothing if it is all you've got. Skeletons spook people if they try to walk around on their own; I really wonder why XML does not.
> Take a simple example of a configuration file that some application will need for saving user information.
Wow, a non-sequitur already? The problem here is not "hey they're not using XML" but the reinvention of the wheel. There are, and were before XML, numerous formats that could be used for representing a conf file. XML is barely *an* answer here, and one that is usually misused to insert one more buzzword in a press release.
> I don't want to write little parsers anymore. I want to use something that is already written and works.
Guess what? There are numerous JSON and YAML parsers available for most popular languages. You don't have to write little parsers if you don't want to, and you haven't needed to since long before XML.
> XSD can be used to insure the integrity of the XML file *before* your program starts to slurp in the data in the file.
As I said above, there are schema languages for JSON. And I really don't understand why every person who talks about XML schema languages just *has* to pick the most verbose, unreadable and annoying one of the bunch.
> XSLT can be used to do arbitrary transformations on the data
So can any regular language, the only advantage this crippled, dumbed down, annoying language called XSLT has over others is that it's written in XML.
Wow, paint me impressed.
And yes, I have used XSLT, I've spent the better half of my days in it during a whole year. I know and understand the thing, and I still hate it, I'd take HaXml or HXT over it any day of the week if I was the one to choose.
> XML/XSL authoring and editing tools abound.
And mostly show how misguided XML is in the first place.
As for XML editors ... i'd like to know which ones are considered "good"?
I have tried several and either they are complex beasts of applications that try to satisfy every possible XML need you might have (Altova XMLSpy comes to mind), or they are very simple editors that let you edit the XML as tree and other forms but not much else (forgot the name).
The former simply have too much of a learning curve to be useful for all people working with XML in our company (and too expensive, too). The latter is simply not powerful enough or it's usability just feels "odd" enough not to encourage people to use it over plain text editors (with syntax highlighting).
I agree in part. There are plenty of situations where XML should never go, and some people use it in incredibly wrong and stupid ways but its not all bad.
Then again it seems software developers are like this, case in point: GOTO
Perfectly acceptable as long as it is done right, developers used it inappropriately and they demonized it as never being the right answer.
Matt Newman on May 12, 2008 6:56 AM> For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format
XML is not a format, it's a format representation, it has no meaning in and of itself and thus *nothing* was "standardized" for any value of "standardized" worth talking about.
Not to mention, long before the XML marketting blitz by the likes of IBM and Sun, there were ASN.1 or INI file, standards if there ever were any.
> I realize it's not the most ideal tool for your social shopping cart 2.0 AJAX app. You'd rather use REST.
Thanks for showing your incompetence and lack of comprehension of the topic, it's appreciated.
Just so you know, REST is orthogonal to the documentation representation used, you can use REST with JSON, with YAML, with plain text, with HTML (guess what, you do every time you access a web page) or with XML. Nice try, no sugar.
Masklinn on May 12, 2008 6:57 AMOoh look, I have a XML-parser with a read and write method. I can dump all sorts of objects in it, save them and retrieve them again. Hmm, ideal for config files. And high scores. UI definitions. Actually, ideal for pretty much everything I like to store which doesn't have to go in a database. Uhm yes, my ints come out as int and my lists come out as list, it's pretty amazing really.
Sure, if it's a plain textfile then I save it as plain text. And an image for example can sit neatly in an images directory. For everything else there's databases and XML.
Code comments as XML? That must be a joke and there are plenty of other jokes around. But in general: KISS and don't re-invent the wheel.
Caesar Tjalbo on May 12, 2008 6:58 AMXML has been around for so long and it's so pervasive we're probably stuck with it for a long time. A few developers using my language have created "easy XML" subroutines that do a lot of under-the-hood formatting and parsing. If we have to live with something we might as well make the best of it. Automate it and forget it.
PaulG. on May 12, 2008 7:02 AMI didn't go through all the comments but I didn't see DSL mentioned. Take a look at this http://www.ayende.com/Blog/archive/7268.aspx for example how to simplify configuration.
Reshef on May 12, 2008 7:03 AMOne thing XML gives you is an ability to randomly access data inside the file without loading it into a database. That can be handy for populating a catalog page in InDesign or building a web page on the fly.
But for something like a config file where you typically read the entire thing in at once it's a useless feature. And for batch-processing scenarios where the receiving system is always going to process all the data in sequence it's a useless feature with a performance penalty.
JPLemme on May 12, 2008 7:03 AMI like XML, honestly, for small things where you don't overuse attributes and all sorts of other junk.
Sort of like your simple examples:
<books>
<book>
<title>Coding Horror for Dummies</title>
</book>
</books>
But once you start to factor in XSL, XSD, XDSLXSLDX -- I just find that it all gets horribly bloated and against the ... well let's just say that I find using simply structured XML files easy and to a degree NICE to use -- but that XML quickly crosses a line from being 'enjoyable' to 'painful'.
N on May 12, 2008 7:05 AM+1 Aaron G.
If you are swimming in a sea of angle brackets perhaps you are doing something wrong. For most developers, especially those in SOA land, it's invisible under-the-hood plumbing that (mostly) Just Works(TM).
Damo on May 12, 2008 7:05 AM(Sorry, my example doesn't show because of the inclusion of the brackets...)
N on May 12, 2008 7:05 AMXML has its place, but lazy programmers use it for everything.
Its a new Windows registry or DLL manifest - something we never really needed, but makes complicated stuff easier (or possible for the more ignorant coder). However, as with all such RAD tools/standards bad programmers like to use it by default without thinking.
The .NET data controls output "horrible XML files" by default for instance... this is where I blame M$ and draw a parallel to the registry... but that would be unfair. As usual its the programmer's fault for choosing the wrong method to store/retrieve his/her data.
Its easier to not think than to think... and we are all bad programmers after all, so I can forgive it. :)
Jheriko on May 12, 2008 7:07 AMthis link is broken. :-(
3. Do you know what the XML alternatives are?
I've been digging into YAML recently and I must say it's a lot easier to pick up on, parse, and write than XML in my experience. It just seems more natural to say
Name: Shawn
Rather than
<name>Shawn</name>
Now if only we could get BizTalk to speak YAML. Sigh...
Kelly on May 12, 2008 7:11 AMI've been wondering about XML for a while. I only recently began to get serious about developing software, and XML was entering its halcyon days right when I started learning. For a long time, I trusted in the ostensible greater wisdom of the collective and assumed that XML really was what its ubiquity implied: The greatest thing since peanut-butter & Nutella sammiches. Recently, though, I really got to wondering about what the point was.
Clearly, XML is no fun to write by hand. The main argument I've heard regarding its verbose plain text format is "it's easy to debug", which makes me want to barf. This is what I'm really wondering: XML is meant to be a data transfer format. Take RSS, for example:
High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day. In situations like this--where every spare pound of fat on your data becomes inflated ten-thousandfold until, like the grotesque beast at the end of Akira, it is suffocating the entire known universe with its pustulent girth--shouldn't we be using a data format that's as thin and possible? Shouldn't the common symbols in a data file be encoded and compressed within the file itself? Which has a smaller bandwidth footprint? This:
<SomeDocument>
<SomeParagraph>XML sucks</SomeParagraph>
<SomeParagraph>no really</SomeParagraph>
<SomeDocument>
Or this:
&1=SomeDocument;&2=SomeParagraph;<&1><&2>XML Sucks>><&2>no really>>>>
The second one is pretty terrifying, but it would be TRIVIALLY EASY for ANY modern editor to translate it into something that doesn't rape your eyes (like YAML). Aren't we actually wasting TERABYTES of bandwidth every day by transferring human-parseable cruft in files that no human should ever see in the flesh anyway? Or am I missing something?
Max on May 12, 2008 7:12 AM"One thing XML gives you is an ability to randomly access data inside the file without loading it into a database."
Er, that's exactly what it doesn't do, hence terrible performance relative to binary, or simple textual data.
bobby on May 12, 2008 7:13 AMAllow me to express my utter indifference: meh!
I work with XML roughly daily as a developer, and it ain't no big thang. It's at least 12 parsecs farther along than the obsolete flat files we're unfortunately still dealing with.
Show somebody XML, even a total bonehead, and they'll figure it out in a few minutes. There's little magic to it, few assumptions made. Can it be abused and misused? Certainly, just like anything else in computer science. Is it largely redundant? Absolutely, but that can also serve to enhance readability in very large files.
Compare to what came before this: inscrutable binary files, INI files consisting only of key-value pairs, fixed-width flat files, delimited text files... Let's not forget our past, folks.
It's computer-readable, computer-writable, and it's more-or-less human-readable and human-writable, even if it makes you a little crosseyed. Which makes it way better than the tarpit we just crawled out from. JSON or YAML or whatever is probably on the horizon, but let's not say "XML sucks" when it was still a huge step forward.
John on May 12, 2008 7:17 AMOops... looks like your comment filter clobbered my examples. I forgot that it's never safe to assume "no HTML" means everything will be politely escaped rather than thrown in the trash. Here they are again, manually escaped like God intended:
This:
<SomeDocument>
<SomeParagraph>XML sucks</SomeParagraph>
<SomeParagraph>no really</SomeParagraph>
</SomeDocument>
Or this:
&1=SomeDocument;&2=SomeParagraph;<&1><&2>XML Sucks>><&2>no really>>>>
Max on May 12, 2008 7:18 AMI'm just thankful developers have turned to XML instead of undocumented binary files. We don't want to return to those years.
Bill on May 12, 2008 7:22 AMXML isn't bad for many things, but space efficient/easily read by normal computer users it is not. Before XML was used for config files, INI files were standard. They have limitations, but you can parse them VERY quickly, they serve the purpose (configuration) perfectly, and they are easily read and edited. I will never understand why XML took off for configuration. As for tabular data, the CSV standard was much better in my opinion. Once again, easy to parse, editable in many apps including excel, quick imports, and a small footprint. When it comes to more complex data, I believe XML is a good solution, but YAML/JSON is better in many cases for obvious reasons. The key is to use a standardized format that is supported by other major technologies. It really doesn't make a huge difference for most things. However, Microsoft added a binary format for datasets in .net for a reason. Sending huge XML files over webservices was slow, and adding a "tighter" format was a huge improvement.
Justin on May 12, 2008 7:25 AMI wonder if you may have also seen JDIL at jdil.org?
they say:
However, unlike XML, JSON provides no direct support for namespaces - and thus no standard way for avoiding name collisions when mixing data from diverse sources. Something like a namespace mechanism is required to lift JSON to the level of a data integration platform, as opposed to a data exchange format only. Also lacking are standard ways of naming objects so that they can be referenced from elsewhere, and for representing properties with multiple values.
If these concerns are addressed, JSON's reach will extend over more of the domain currently occupied by XML, while bettering XML in the cardinal virtue of simplicity.
I've become a big fan of JSON for 100% JavaScript web applications. You can convert XML from any online data source into JSON using Yahoo! Pipes. However, JSON is not very readable. It is a nasty mess of brackets. I have to use a JSON Viewer to figure out the object structure. Recently I used ASP.NET's JavaScriptSerializer.DeserializeObject to deserialize some JSON data into objects. This is totally undocumented and proved to be very difficult to figure out.
for added sillyness..
a program i wrote was written around a custom text parser.
this ended up being used within a large data analysis program, that needed to store settings etc.
I modified it to read a 'script' at startup, a script that could contain variables and other settings the program used.
perfectly human readable, since the plain text 'comments' were ignored, and it was childsplay to get the program to write the config file.
sledgehammer to crack a nut for most programs, but since this included the parser i figured why not
personally i don't care what the format is as long as its plain text of some sort, and thus easy to backup and copy.
essentially sod anything in the registry or some hidden binary file. i love the idea of the unix based 'dot' hidden config files. put them in the program directory (defaults) and the users directory for everything else.
btw what was wrong with .ini files? have a standard user dir & system dir for them and it works.. problem? never understood why they moved away from that
claire rand on May 12, 2008 7:26 AMHere here! Preach on, brother!
Dang I wish YAML had become the standard. Do I use it? Nope - because there are parsers for XML built into my language framework. Perhaps once a parser for .Net becomes established I may be able to convince our team to use YAML, but I seriously doubt it. The tyranny of XML will no doubt continue.
Referencing this article just might help, though.
"You might argue that XML was never intended to be human readable, that XML should be automagically generated via friendly tools behind the scenes, never exposed to a single living human eye. It's a spectacularly grand vision. I hope one day our great-grandchildren can live in a world like that."
My dear sir, If you cannot, in some way or other, code it by hand, it's not a language worth using.
one of the big issues with HTML editors in the past has always been EXTREMELY redundant and sloppy code.
Regardless of the language, wizards, tools, and widgets are quick, but the codes should always be visible somewhere, somewhat understandable, and always editable.
Also, I wish they'd stop fucking with the standards. HTML has been around for ages, STILL gets much use (especially the oh-so-dreaded font tags CSS was to get rid of) The HTML/XHTML/XML bitchfest in fact, parallels the CSS fiasco a few years ago.
Introduce a new "language" based on what most people are "just fine" with, make it screwy enough that you have to reframe everything you already know and want to incorporate, and make it finicky enough that a number of people will revolt against it. What for? a little bit more flexibility and usability.
I won't be surprised if five years from now, people will still be refusing to use XML for things handled MUCH easier in other languages.
The Postindustrialist on May 12, 2008 7:30 AMI wanted to post some clear well defined rebuttals as I am a fan of XML and it's related technologies, but I can't really disagree with overall sentiment of the post.
SOAP is awful, and although less so, the XML used in the examples is too. Yes XML is often use like a club.
But some of these comments... come on! There's all sorts of guff popping up here from "XML is too hard" to the what's almost a carbon foot print argument!?
Ian on May 12, 2008 7:37 AMClearsilver
Same ideas but made in a more pragmatic view I belive...
http://www.clearsilver.net/docs/compare_w_xmlxslt.hdf
I like JSON because it's lighter-weight, and AJAX apps can easily "programmify" the server response by doing an eval. Of course there are a couple minor security risks with this, but they can be avoided.
The only reason I ever use XML is if I needed to pass data to/from different platforms (using SOAP). Good post!
Josh Stodola on May 12, 2008 7:42 AMSOAP is possibly the most horrifying example. Even if you set aside the whole document/message thing and the poor library intercompatibility, you are still using vast amounts of expensive-to-parse XML to model, in most cases, rather simple function calls. Benchmarks comparing SOAP to CORBA or Thrift (with binary protocol) or whatever tend to be almost comical, and one derives no real benefit from RPC being in XML, and yet SOAP is still heavily used as an RPC mechansim.
Robert Synnott on May 12, 2008 7:44 AMI do not agree with you at all on this matter. Although you try hard not to be anti against XML; but you sound much against it.
Even though it might not be all that easy to read for humans, but atleast it can be read. It was made in the era when programmers used to invent their own formats to write the application data in. Atleast we have a standard now. You can go on picking on it and soon we will have a stage when everybody is writing data in their own applications and their will be no interoperability.
Deepank on May 12, 2008 7:49 AM> Something like a namespace mechanism is required to lift JSON to the level of a data integration platform, as opposed to a data exchange format only.
But why would you do that? Why couldn't JSON just stay a simple format for data exchange and basic data storage? It's a tool, and it's an awesome tool for what it does. Use an other tool (e.g. YAML) if your task or data is more complex than what JSON can do.
Becoming "the hammer to nail them all" (including screws, puppies and ducks) is exactly what has gone wrong with XML, why would you want to repeat the same mistake?
Masklinn on May 12, 2008 7:55 AM> their will be no interoperability.
But there *is* none already! XML is not a data format, it's a data format representation, just because your config file is in XML and mine is also in XML doesn't mean they're interoperable in any way, shape or form. And that's why people have to build complete, custom, non-interoperable data formats on top of XML such as SOAP, XSL, XSD, DITA, ...
Masklinn on May 12, 2008 7:57 AMLISP and Scheme are excellent alternatives for XML.
leppie on May 12, 2008 7:59 AMWhat I find most bothersome about dealing with XML is that "parsing" it tends to happen on two levels. You use a parser to turn characters into XML elements, then you hand-roll another parser in your programming language to turn the "start-tag foo" tokens into actual data. (Or else I'm missing something huge.)
Whereas with JSON, you call json_decode($data, true), and whoomp! There it is.
After a lot of trying to make it automagic, I've also realized that XML is not 100% interchangeable with JSON, because XML has both attributes and text nodes. And I think that's where the other half of the pain comes from, for me: when JSON isn't enough, I use XML, but I don't have foo.innerText in most languages because the DOM insists on dealing with raw nodes. Grawr.
That's my hard-earned, unpopular opinion....
sapphirecat on May 12, 2008 8:01 AMBut it's just so damn ENTERPRISEY!!
Mattkins on May 12, 2008 8:03 AMSoap is insanity. XML in general is not so bad.
However, I agree that XML is overly complicated for representing flat data. Yeah, it's great that we have a standard. We can do better. Let's come up with one that make sense.
Jeff Davis on May 12, 2008 8:04 AMSorry, I don't agree. I'd rather have one syntax than 2,746 different "common standards", each with different bugs, each maintained by a small group of people, rather than an entire industry. XML has become what it has because people needed something to fill that role. And it turns out that it was flexible enough to handle tasks outside of its original design goals--the hallmark of any good system.
Is file size the biggest problem in computer science right now? Aren't there bigger battles to fight, or is everything else a smaller problem than this in your view?
I'm tired of people complaining about how expensive XML is to parse (as if you were writing the parser). Compared to what? JSON? Scale JSON to something that handles namespaces, includes, queries, encodings, and we'll talk. Give me one example where XML has been critically big or slow or complex.
Where's the complaint against HTML? Why don't you write this site in flash, and get rid of the RSS feed? You are personally adding strength to XML, you know. Let's see you put your money where your mouth is.
Brianary on May 12, 2008 8:05 AM> It's amazing. For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format, and then the "agile" groups had to mess it up with their JSON and YAML and whatever.
I think that's the point Jeff misses in his post. His SOAP example is completely self-describing. A zero-knowledge interception layer could evaluate that SOAP request and with absolute certainty (no heuristics) act on it. His email example would require heuristics that would occasionally be wrong.
XML is wrong for some things, of course. HTML is the right choice for web pages as humans are inherently heuristic and any errors in HTML are tolerated (by convention) much better than errors in XML tend to be tolerated. If a particular XML document schema were complex enough and used in enough disparate environments it would eventually become HTML-like in this respect.
But just because HTML-style heuristics are the right choice in the browser environment doesn't mean they're right in all situations. I think strict, unambiguous interpretation and format are key attributes to enable many inter-operation scenarios. I don't know about his YAML example, but one thing I take from it is that newlines and whitespace are important, which is a no-go since text documents often get their whitespace mangled as they are passed through systems. I don't think most people have a tolerance for casual data corruption.
David Gladfelter on May 12, 2008 8:06 AMFinally someone is saying what I felt all along. XML is a great document language, but a bloated data language. Not all data is a document. XML does great for documents, things like HTML, ODF, and so on. For configuration and other programatic data it just has way too much structure.
XML contains a lot more structure than most programming languages can represent natively, and so requires complex class libraries like DOM and SAX to access all that structure. Languages like Perl, PHP, Ruby, Javascript, Python, Objective-c, which let you create anonymous data types (or something close) organize everything into hashes (dictionaries) and lists (arrays). That JSON organizes everything into these two primitive types makes it, I think, the perfect data storage format. There is a 1:1 mapping between JSON and the internals for most languages, saving a lot of time and headache, and parsers and generators are easy to write.
Java is this big exception here, and I think Java has a lot to do with XML's popularity. With no good way to have anonymous data structures in Java, embedding data in your application is just not possible, you have to store it externally. The java folks were looking for a good flexible, expressive, easy for humans format for a while, and XML fit the bill. Java's collection classes, and many of its APIs, are already a bit unwieldy, so DOM and SAX didn't really seem too bad over there. Plus its DTDs, validators, etc, really fit in well in the bean counter environments where Java is often used.
So now those of us using languages that have more native support for complex dynamically structured data and a "just do it" attitude have to deal with something that was designed for a completely different sort of ecosystem.
JSON doesn't solve every problem, and it probably could use some good standard for something like DTDs, and something like xquery, but I've seen work done in this direction, and its all much simpler than the XML equivalent but can still represent anything, even a DOM tree.
Joseph Annino on May 12, 2008 8:09 AMBTW, here is how to use XML without exaggerating the cost:
<message
Date="2008-02-14T08:55:03Z"
From="The Whole World <us@world.org>"
To="Dawg <dawg158@aol.com>"
>Dear sir, you won the internet. http://is.gd/fh0</message>
This one is not so contrived as the ridiculous example in the post.
Brianary on May 12, 2008 8:13 AMOne of the good thing of XML now is all the libraries & tools that available to support it.
J on May 12, 2008 8:13 AMGreat. No HTML, but anything that looks like a tag is removed. Brilliant. What an excellent choice when hosing an XML discussion.
Brianary on May 12, 2008 8:16 AMXML is just a poor syntax for s-expressions. I'm disappointed that only one of you mentioned Lisp dialects, though Pádraig Brady got at the right idea when mentioning that Python has a read() function.
S-expressions are much easier to validate than XML (just think of all the possible ways that angle brackets can be broken) and easier to write by hand (a lot of text editors help you with the parens). There are also possibly hundreds of well-tested implementations of s-expression readers. The syntax is laughably trivial (it's either an atom, or a list of atoms). Besides, it's been done for literally fifty years! And you don't need to use a Lisp dialect to use S-expressions -- you could just as easily implement READ in some other language.
mfh on May 12, 2008 8:17 AMWhat I was trying to say (using square brackets):
[message
Date="2008-02-14T08:55:03Z"
From="The Whole World <us@world.org>"
To="Dawg <dawg158@aol.com>"
]Dear sir, you won the internet. http://is.gd/fh0[/message]
> Java is this big exception here, and I think Java has a lot to do with XML's popularity. With no good way to have anonymous data structures in Java, embedding data in your application is just not possible, you have to store it externally.
You're probably right here, as PJE explained in his "Python Is Not Java" (http://dirtsimple.org/2004/12/python-is-not-java.html),
> This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding". In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)
> If you are a Java programmer, do not trust your instincts regarding whether you should use XML as part of your core application in Python. If you're not implementing an existing XML standard for interoperability reasons, creating some kind of import/export format, or creating some kind of XML editor or processing tool, then Just Don't Do It. At all. Ever. Not even just this once. Don't even think about it. Drop that schema and put your hands in the air, now! If your application or platform will be used by Python developers, they will only thank you for not adding the burden of using XML to their workload.
I'd also really like to know why angle-brackets are so much worse than the newlines and indents required for YAML, and the quotes required for JSON, and how those languages provide data type decoration, schema validation, and declarative transformation.
Brianary on May 12, 2008 8:21 AMMany systems use XML simply for the sake of being buzzword-complete. If you're just transferring small amounts of information, using XML will indeed ruin your signal-to-noise ratio. But sometimes the structure of the document is a part of the message. If you need to exchange complex data structures with someone, you could certainly do worse than to send an xml schema and/or a sample document. The developers will quickly understand the general concept and the validator will make sure that the details are correct. Of course, in a perfect world everyone does unit tests and data validity will never be an issue..
Hirvox on May 12, 2008 8:24 AMThis is exactly like the discussions that arise when someone decides normalizing data is unnecessary, and have invented some new way to add array data types to SQL.
Brianary on May 12, 2008 8:25 AMI have a question for "The Postindustrialist":
You say it's not worth coding unless you can code it by hand. What about images? What about databases? You can't code JPEGs in a text editor. You can't code MySQL rows in a text editor. That's because they're optimized for their domain, which is what I contend most data should be. Joel Spolsky has a great blog post (http://www.joelonsoftware.com/articles/fog0000000319.html) where he goes into that. Search for "Quick question" at that URL to jump to the place I'm talking about. I don't see why we're taking such a performance hit in the name of being able to edit something in our favorite text editors. Who cares? Why is that important? Why is it okay for images and database rows to have their own optimized editors, but not other kinds of data?
Max on May 12, 2008 8:28 AMIf you'd have said that about ASCII 15 years ago I'd have said the same.
If for nothing else, XML gives you an encoding standard - We aren't all American.
If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
If the overhead is too much for your bandwidth I'm sorry.
It seems to solve more problems than it creates.
Oh, and it's human readable too (even if it does irritate some folks)
By the way, Brianary, you can display HTML in your comments by using &lt; instead of < and &gt; instead of >.
Max on May 12, 2008 8:29 AM@Max: That's HTML. It says "no HTML".
Brianary on May 12, 2008 8:31 AMIt's not that the *idea* of XML is bad. It's how it came to be implented. Like most creations by developers (e.g. C++, Assembly, Lisp, etc.), it is a human factors disaster.
NO consideration of presention interface to humans was given more than passing consideration. "Signal-to-noise ratio? Who cares? It's parsable, isn't it?" is the attitude that will continue to sink thousands of software ships on the reefs of confusing unreadability.
It may be parsable by the *machine* but it's *humans* who have to write and understand software. Making any code interface immediately obvious to the human mind is orders of magnitude more important than making it easy for the machine.
ThatGuyInTheBack on May 12, 2008 8:32 AM(No one reads this far down in the thread do they?)
When XML was first introduced I often said that it was a great way to expand your data by a factor of ten. Binary formats are often way more compact, except for text. Binary formats are also way more difficult to work with. All in all I'll take XML over the old binary formats just about every time.
The conversion from binary formats to XML was the big win.
Comparing XML to JSON, YAML, and so on is a useful exercise, but let's keep the larger historical perspective. We are already hugely better off than we were a few decades ago.
Rich on May 12, 2008 8:34 AMThanks for this post. I didn't know about YAML or JSON. Both standards are pretty interesting.
On first look, I would prefer JSON a little over YAML. But I don't think I like the name of JSON, because it's bound to a language. Please note: I don't say I dislike JavaScript or think JSON would usable only from JS, I just don't think it should be in the name of a standard that is obviously cross-language.
Hinek on May 12, 2008 8:34 AMI wholeheartedly agree with all you said. However, 1 point was not addressed: availability. I often choose inferior *ubiquitous* technology. Support, community, and 3rd party enhancement make life easier and development quicker going with the most common rather than going with the best. This gives me a greater range of choice.
My mp3 player is an iPod not because it's the best but because I wanted a wide range of options for durable cases and you can't find that with any other brand. I use Windows XP because I can find a free, and usually open source, version of a tool to perform any common task. These tools also are not the best of breed but I don't care. They get the job done and I go on with my life with more time and more money than if I demanded the best of everything.
I often choose to use XML for the same reason for many things even though it's a lousy choice (esp with data). However, all of my co-workers can look at my code and know what it is immediately. We can also use an all but infinite number of 3rd party enhancements later to modify the XML.
dinah on May 12, 2008 8:35 AMI've always disagreed with these types of arguments. Sure, there are cases where xml is abused. Packing csv into flat xml structures is usually bad, and I think we've all seen the case where someone serialized a binary by putting one byte per xml element in a huge xml file.
But that isn't the whole story. Your email example does it for me. If I use xml, I can use an xml parser that will always parse valid xml correctly, and I just deal with the data. If instead, I invent a spec for a format like, then I have to write my own parser. That looks incredibly simple, but you also have to remember all of the variations that an email header allows (splitting lines at certain columns, quoting names instead of leaving them unquoted, etc).
With XML, I just say el.InnerText and be done with it.
Similarly, that chess example looks like the xml would be hard, but if it had a schema, my editor would hide virtually all of that from me by giving me dropdowns for the elements that are completely context dependent. You could theoretically do the same thing with YAML, but the editors are not nearly as mature.
So in the end, by criticizing xml, you've really vindicated it. Good job.
Jess Sightler on May 12, 2008 8:40 AMAs long as every major language has libraries to allow reading/writing XML in both node-by-node and XPath modes, I'll take XML over any other format out there. Why do I want to spend my day writing custom parsers or trying to understand yet another file format? I'd rather save my mental energy for understanding the semantic content rather than the presentation. For simple key/value pairs or tabular data or pure text I don't know that I would choose XML, because parsing those formats is dead simple... But in most other cases, why not? Because a few examples can be found where XML was used badly? Hardly persuasive. That's like arguing that delimited flat files are bad because comma delimiters can be problematic when your data includes commas.
Michael on May 12, 2008 8:46 AMXML is the greatest thing to happen to data since databases. I believe that its benefits greatly outweigh its disadvantages. Show me any alternative which can be easily human readable, easily machine readable, whitespace-independent, allow comments, allow escape characters, represent serialized objects, platform-independent, represent web pages (XHTML), allow standard configuration, allow ordering of elements to be ignored if applicable, or not ignored, depending on the context, provide a simple set of basic rules, easily translatable into other formats (XSLT, CSS, etc.), and be accepted throughout the industry, and I'll gladly consider that. Until then, I'm perfectly happy with XML.
Joe Enos on May 12, 2008 8:53 AMI couldn't disagree more. XML, by virtue of ubiquity, is one of the simplest, fastest and safest data formats out there - give a random developer and xml file, and he'll have it parsed and usable in his program faster than with any other format out there.
CSV is OK, but it's too simple, and doesn't deal well with hierarchies and doesn't cope well with exceptions like commas, semicolons, tabs and newlines in the data (depending on the variant of CSV you're employing).
XML is discoverable - that SOAP example you just showed may be verbose, but it's legible with minimal knowledge, and it's easy to find similar examples online due to the wealth of keywords it contains.
By naming elements and attributes, queries over XML become far more legible. Data structures which support nameless lists or maps of data generally will have less legible code.
Sure, xml has flaws, but it's a lot more robust than JSON, and fairly simple - if overly verbose - to read. The ability to express data structures in a cross-program fashion is extremely useful, and xml (and some other formats) has been repurposed to achieve that goal.
There probably doesn't exist a serialization format which is both succinct, robust (in the sense that it's unlikely to mistake one structure for another - i.e. one xml data format for another), and simple - but XML is the best we have.
Take config files, for instance - xml might not be perfect, but each time I see the extreme diversity of syntax a simple linux distro has within /etc, I'm reminded that it's better to have a bad standard than no standard.
Eamon Nerbonn on May 12, 2008 8:55 AMAnyone who argues that "XML was never intended to be human readable" (which doesn't include the author of the article you link to; his thesis is rather different) has not made recent enough reference to the design goals of XML
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals
one of which is "XML documents should be human-legible and reasonably clear."
Obviously some implementations fail to meet this goal.
For the record, I accept a lot of your points, but the Churchill quote is still apt - basically we don't ever want to go back to a pre-XML world. People are constantly inventing techniques which are superior to XML for some things; that's good too.
Dominic Cronin on May 12, 2008 8:56 AMI just finished a small project here where we needed to communicate over HTTP between a data source (an internal web application) and a .Net client application.
SOAP seemed the easiest route on the web side: implemented in just a few minutes with Perl's SOAP::Transport::HTTP. Two API calls and a class later it was done.
On the client side? A nightmare. .Net assumed a full SOAP-server environment with a static API and a bunch of rigamarole I was unwilling to put up with. So, cleverly, I just thought I'd just use HttpWebRequest and parse the SOAP response. It's just XML, right? WRONG. Nothing is ever "just XML" in XML.
Namespaces piled onto document context piled onto all kinds of stuff to make using XPath-like queries nearly impossible. I finally wound up with crap like:
XmlNode xn = this.reply.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild;
Completely unnecessary, and could have been done a lot more simply with just delimited ASCII text I'm sure.
Lisp has had this problem solved 50 years ago, with S-Expressions. There is a reason the Lisp developers of the time refused the more "natural" M-Expressions that were planned for a future version of the language.
Alex Queiroz on May 12, 2008 8:59 AMWhat I hate about XML is the suite of "standards" that accompany it: XSD/DTD [for data integrity], XSLT [data transformation language], and XSL [xml stylesheets], XSL-FO, WDSL, and SOAP. It wasn't designed for the KISS-minded folks, but rather 'standards-body' type people who like complexity for the extra job security.
And XML was driver for creating a new industry of "Service Oriented Architecture" and made XML really fat and clunky. I have actually used XML for document authoring and it wasn't pretty since it used XSD/XSL/XSL-FO to generate an XML document to PDF!
XML in its raw form of simply tags and brackets, works well only for information that needs to be represented in a hierarchical, tree-like, nested format. That is XML's expressive power at its work.
But for most of the other information out there, such as passing information along, you can use a more flattened structured format that is more compact, easier to parse by algorithms (XML parsing is SLOW because of it's recursive, tree-like structure), and you don't need to depend on third-party tools to do it.
We are about to standardize all of our persisted text format configurations into YAML here.
Andy at Simutronics on May 12, 2008 9:10 AMPersonally I have switched to YAML.
That includes all my config on Linux too.
I use ruby to generate specific formats if need be (including XML).
I would never again in my life go back to writing XML with hand, and I refuse to stick to XML for _humans_.
Die XML, DIE!
They now say that XML was not meant for config files etc.. but it is a huge lie.
XML was meant to bloat the world and annoy users.
Like Captchas.
she on May 12, 2008 9:11 AMGreat post Jeff, I agree 100% whole heartedly!
Arnor Heidar on May 12, 2008 9:15 AM(memo
(date :year 2008 :month 02 :day 14)
(from :name "The Whole World" :email "us@world.com")
(to :name "Dawg" :email "dawg158@aol.com")
(message "Dear sir, you won the internet."))
I love the modern developer thought process. If you proclaim s-expression programming languages are confusing and too hard, you're a whiner and need to leave programming to the big dogs. If you proclaim that XML is confusing and too hard, you're an enlightened realist. Yes it is abused, but it works and the structure is relatively unambiguous, especially compared to the thousands of arbitrarily different pseudo-hierarchical INI/text formats on any UNIX machine. It's funny because you can see where they started out with a simple value pair, then suddenly oh crap, we need simple structures, tack it on. Except everyone does it differently. I don't like parsers, I don't like reading your crappy file format that's allegedly better than everyone else's. I hate your tab-delimited hierarchical hack of CSV and I hate your special version of INI. I hate Makefiles, they are ugly and the syntax is stupid. I hate working on parsers written by programmers that think they're too smart to use XML.
Yeah, SOAP is crap, but ANT is not that bad, it's an improvement over Make if you just suck it up and learn something different. XML is verbose and it's slow to parse and DTDs are stupid. But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
This is not directed to Jeff, but to some of the commenters. But seriously, the XML in that YAML comparison was practically transparent to me, I don't see what the big deal is. And I work with XML file formats that would make your hair fall out of that bothers you. And the email example, it may look good onscreen but I seriously do not want to ever have to maintain the black magic in Sendmail or try to write a functionally correct mailfile parser. The devil is in the details, which is exactly what is wrong with "simple" file formats.
Erik on May 12, 2008 9:21 AM> You say it's not worth coding unless you can code it by hand.
I don't think I've seen anybody say this. The problem is that XML was started as being "handcoded", and even today "XML-aware" tools mostly aren't much better than handcoding.
> What about images?
Depends on the images, SVG are easily hand-coded. But most images are fairly opaque binary formats with quality data-specific editors and deal with concepts that aren't and never have been hand-coded, for most.
> You can't code MySQL rows in a text editor.
Er... yes you can, it's called SQL. Or, when dealing with import/export, "excel", "CSV" or even "JSON" (the Django web framework has a data import/export to/from databases and its default serialization format is json)
> That's because they're optimized for their domain, which is what I contend most data should be.
And which XML is (document markup), except the vast majority of the uses of XML are far outside the domain it was initially created for.
> If for nothing else, XML gives you an encoding standard - We aren't all American.
Yes, and that's indeed important. I agree.
> If for nothing else, XML gives you *one* parser for all these 'silly' files you keep coming across.
Actually, no it doesn't. It merely gives you half the parser (the structure), but it doesn't give you the syntactic parsing. In fact, JSON and YAML parsers give you much more on the parsing side as they also give you syntax (e.g. JSON has numbers, strings, arrays and maps/dicts/hashes and they're usually translated as such in your language of choice by the parsers)
> Like most creations by developers (e.g. C++, Assembly, Lisp, etc.), it is a human factors disaster.
Would you be kind enough as to explain what part of Lisp was a disaster? And why?
> Why do I want to spend my day writing custom parsers
Fail, both alternatives suggested here have parsers for pretty much every popular and less popular language out there.
Masklinn on May 12, 2008 9:30 AMThe thing about XML is that every language has a solid and well tested parser for it. Sure, it's not hard to parse other formats, but XML is nowadays built in anything, so it's a bit like a "reverse chicken-egg problem". That also means that for interop, you only need 1 parser in your application. If I would need one YAML Parser, one XML Parser, one .ini Parser in my Application, then I am creating redundancy in a certain way.
With current Disk Sizes, the extra space for storage should not be an issue (and if it is, there is surely a good way to store the xml file compressed), and for the Web, i'd rather see everyone turn on gz compression on their web servers first, before they switch to another format.
In short: XML is not the best tool, but it is one tool that works really well in most situations.
Michael on May 12, 2008 9:33 AMIf you are writing your own XML parser, then you are doing something very wrong.
XML was built with totally different design goals than JSON/YAML and unfortunately people have brutally misused XML as anyone that has used JavaEE pre Java 5 knows.
Gudmundur on May 12, 2008 9:35 AMXML may give you structure to a file, but everybody chooses a different implementation. I think that is Jeff's issue with it.
is it a:
<car>Mazda</car>
<vehicle>Mazda</vehicle>
Same DATA, different MARKUP.
In HTML, a <P> is a paragraph. The markup has meaning. In XML, the markup relates to the data it contains, anyone can define the markup as they see fit!
You could have 2 files with the same data in them, but represented differently in markup.
Some benefits:
With XSD (Schemas), you can validate it pretty easily, I think that is benefit. There are tools (XSD.EXE) to create .net class files from the schema although some of the classes it generates are not very readable, but it does give yo the magic xml.serialize and xml.deserialize with one line of code. Also, transformations from XML to HTML with XSLT, good for the web.
I don't think there is anything wrong with flat files or CSV files or even INI files. If you have a system that is transferred fix length flat files between applications, I don't see that much benefit in moving those files to XML formatted files.
Sure, there are parsers to parse the XML, but you still have to write code that uses the parser to get the DATA out of the XML.
It does lead to some of this nonsense code a posted by Clinton:
XmlNode xn = this.reply.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild;
If anything, I think XML is overused. Which is better INI file or XML configuration file?
Both can be parsed
Both are human readable
Parse time should be about the same if small size
Seem like the same to me.
Jon Raynor on May 12, 2008 9:35 AMI think that it is indeed true that XML was never meant to be read. It was meant to be easy to parse. And with parsing, more verbosity is ALWAYS better, because it allows for explicit clarification. Besides, anything that is not aesthetically pleasing doesn't have outputted by the parser.
anon on May 12, 2008 9:35 AM> Yeah, SOAP is crap, but ANT is not that bad
That's probably why even Ant's creator said that, in hindsight, using XML was not such a good idea (http://weblogs.java.net/blog/duncan/archive/2003/06/ant_dotnext.html)
> it's an improvement over Make
Pretty much everything is an improvement over make (and, really, Make isn't that bad until you have to go and use auto*).
> But for every SOAP monstrosity there's a logical, clean Hibernate HBM file.
Which have been advantageously replaced by annotations that are cleaner, clearer and simpler to read. And not separated from what they're talking about.
Masklinn on May 12, 2008 9:37 AMMarkup was removed on previous comment
markup : car data: Mazda \car
markup : vehicle data : Mazda \vehicle
But, hopefully you got the example.
YAML might be less verbose, however the parsers are picky enough that you can't write YAML by hand. The future is configuration as code.
Leon Brocard on May 12, 2008 9:40 AM@macklinn got it right. XML is the reductio ad absurdum proof that the Wisdom of the Masses is false. There is no There, There. It's just text. Parsers are only a *requirement* to extract the data, so that your Custom Built program can do its thing. As Fabian Pascal said:
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged — semantics — and once they do that, there is no need to repeat the tags in each and every record/document being transmitted. Any agreed-upon delimited format will do, and the criterion here is efficiency, on which XML fares rather poorly...
BuggyFunBunny on May 12, 2008 9:41 AMI have to agree - I'm no fan of XML, and never have been. I can appreciate that it's a clever way to express any kind of information, and separate it from any kind of presentation, but it's just hard to use.
Especially when you add the complexity of XML namespaces. Writing a concise XSD file that people will find useful is a nightmare, and that's if you have tools to help. What I want is a simpler version of SQL Server express, that doesn't need an install particularly. Text files might not be it at all. In fact, look at the way Outlook stores local data in pst files.
But hey, I have to say that a simple data design, when applied unsullied, is my preferred method of storing configuration. Not the default Settings providers, you understand, but plain old serialisation.
Neil Barnwell on May 12, 2008 9:42 AMFrom my experience with config/data files, the majority of the time processing them is figuring out how to parse them. Escape sequences are a pain, picking delimeters and agreeing on a format are time consuming operations. XML is a real win here because it gets rid of these problems and allows you to quickly get config/data files running.
The other big win is it's generally well understood at this point. Even for people who don't deeply understand XML it's pretty easy to understand at a glance how the data is structure. This may have more to do with the amount of people who have done at least minimal HTML rather than intuitiveness.
I completely agree this can be overkill at times but IMHO it's an acceptable default solution.
Jared Parsons on May 12, 2008 9:50 AMI think the heavy use of XML highlights a problem with programmers today. Namely, they don't understand or cannot apply basic principles from compiler design theory.
If people understood how easy it is to create a good parser, they would be more willing to make up and use other document formats. With XML, you can find a pre-built parser in just about every computer language. However, if you make up a file format that makes sense for your situation, then there is no pre-built parser for that. As in the memo file example from the Jeff's article, it would be difficult to find a pre-built parser for that format.
For people who don't understand compiler design theory, creating a parser is a monumental, error-prone task involving lots of deeply nested if statements. It is much easier to just use XML. For people who understand compiler design theory, creating a parser is as simple as breaking up the file into tokens, and then writing a grammar to read the tokens.
I agree with Jeff, but I can understand why everyone is using XML.
Jon Snyder on May 12, 2008 9:54 AMIt always amazes me that the same people who love XML are the ones that deride Lisp for all its parens.
Peter Eddy on May 12, 2008 10:03 AMSeems like a non-topic to me, there is no one "be all and end all format " that neatly meets the needs of all constraints that it may ever be adopted within, so this entire topic seems moot to me. Actually I'm not even interested in discussing why XML is so widely adopted like it is today, that's another pointless discussion.
What really interests me is why you feel that XML is used sometimes inappropriately. XML is always chosen for a reason. Even if it is simply an economic issue that is a valid reason. Even if it is simply because "programmer x only knows XML" (which I really doubt is ever a real world case) that is a valid reason to use XML. Although real-world is never this simple, there are always several reasons for any decision both explicitly and implicitly chosen.
Always design with all important constraints considered. If your design requires that people need to quickly and easily read the contents of remotely transmitted messages, then investigate commonly remotely transmitted messages. Don't re-invent the wheel. Find the solution with the fewest deficiencies possible from your constraints. A good architect doesn't just review one small aspect of a solution without looking at the solution's design from a macro-level and also without looking at the solution's compliance to its requirements. Sometimes perhaps XML implementations are simple idioms though and maybe in those cases things can be micro-reviewed.
Shan on May 12, 2008 10:09 AMI suppose if you don't like XML you could spend a decade inventing your own syntax, writing libraries for it in every language and inventing a whole host of auxillary tools and specs to work with your proprietary syntax. This would be an enormous, extraordinary waste of time but it might make you feel warm and fuzzy on the inside.
ocean on May 12, 2008 10:11 AMAn XML configuration file just seems like taking a simple INI file and making it as confusing as possible.
I just completed a small project that uses INI files. The application requires that the user manipulates the configuration files and INIs seemed like the easiest way to make that happen.
I used Todd Davis's code from CodeProject: http://www.codeproject.com/KB/files/VbNetClassIniFile.aspx
There is also a Open Source INI library on SourceForge called Nini: http://nini.sourceforge.net/
Stewart Schatz on May 12, 2008 10:21 AM"I'll call these 'data' scenarios, as opposed to the 'document' scenarios for which XML was originally intended."
That quote still has me scratching my head. I have been under the impression for the past 10 years that the entire purpose of XML was to define the structure of data used and generated by disparate systems so that those disparate systems can work together. If that's not the primary intention of XML, than what, exactly, is?
...or did I misunderstand the quote?
cranley on May 12, 2008 10:29 AMXML is celebrating its 10th birthday and you decide to spoil the party. :)
XML has never been a silver bullet - it's has some negatives like its many alternatives. The issue is with developers who adopt it inappropriately.
I happen to love XML (and XSLT), but feel I know where and when to use it.
I like pointy brackets too, but I also like curry for breakfast.
Shane Porter on May 12, 2008 10:38 AMXML has its place but I still use .ini files for storing settings. They are smaller than XML files and parsing them is easy. A lot of people see me writing apps which user ini files these days and are shocked I am not using XML. Sadly when I explain they cannot get around their mindset that XML files are the answer to all problems.
Also I am glad you mentioned YAML as it is awesome :)
Morgan on May 12, 2008 10:39 AMI completely agree with Jeff.
I'm not totally against the use of XML. But as Jeff pointed out, its use can be avoided in certain places like configuration files, quick databases etc. XML is very taxing when it needs to be used with the mobile devices where the data transfer is painfully slow and pricey.
For example, when implementing XML centric protocols like XMPP on mobiles, it takes more than 800-900 bytes to send a small text message across. Same is the case for web services. Average amount of data transferred while executing a remote method is anything between 2 to 3 kilobytes. That's really insane as the 70-80% of the payload is noise.
XML not only demands more bandwidth, but it's CPU (and in turn battery) demanding when parsing on low power devices.
Nikhil Belsare on May 12, 2008 10:40 AMWhat is with all this XML bashing? Maybe you guys are all form the old generation of programmers, who are use to using unstructured documents. XML has many advantages:
-data validation
-arbitrary data structures
-easy to manipulate
Good. Freaking. Lord.
An idiot with the best hammer available will still build a crappy house.
Get past the tools already.
Tom on May 12, 2008 10:47 AMI'd still rather store things in XML than the Windows Registry.
Maybe we can get back some day to using INI for storing program configurations.
I mostly just use XML for RSS feeds... might not be the intended usage, but it works pretty well. I always found parsing the things a pain though.
Kris on May 12, 2008 10:57 AMXML is a compromise between readability of information and efficient storage of data, and, like all compromises, can't be perfect on every aspect.
As you say, it's also the foundation for the industry-standard SOAP.
And, for readability, there are always alternative ways of visualizing XML, like viewers, etc.
I have no great love for XML. And there's definitely some horrible misuses of it, SOAP among them. But outside of the world of Unix it's largely replacing binary stores, and for that I'm happy.
I don't think XML is the best default choice, but I'd much rather have developers start with a text format and only move to a binary if really necessary. And if XML is the only text option they'll consider, bring on the XML. (Which text format is a bigger question: CSV is very useful when you're dealing with lots of similar-formed tabular data, YAML is great for a simple things, and binary is still the fastest and smallest. I haven't used JSON yet, so omitting that out of ignorance rather than merit.)
Tangent: The "Less efficient!" argument against all human-readable formats annoys me. Does it really matter in your application? If it does, go ahead and pick something else. But stop using that as the central point of any argument, as if it's always worth trading ease-of-use for efficiency.
Steven Fisher on May 12, 2008 11:24 AMAck. I edited my comment above before posting and it looks like I was saying binary is a text format alternative to binary. I just meant to say that text formats are great, pick the right one, and binary formats still have their place.
Steven Fisher on May 12, 2008 11:27 AMYeah, I agree. Next time I'm just going to save my data in compressed binary or deep into a database or force it into flat property files like almost everyone did before XML. This XML stuff sucks.
I don't understand why anyone would actually deal with raw XML, it's not really a human-readable format...
Not that I'd wait for a generic XML editor that does everything for everyone--but if you are a programmer supplying a system and you ever tell users to bring up their editor and edit the XML file, you're doing it wrong.
Bill on May 12, 2008 11:27 AM@Nikhil Belsare: That a great rationale for only using gopher on handhelds!
Brianary on May 12, 2008 11:29 AMCan't say I disagree with this post more.
The reason XML is so great for everything is that it is ubiquitous. There’s something to be said for not reinventing the wheel with every application. There’s something to be said for using built-in .NET or Java libraries (Dude, have you seen what cake it now is with Linq2Xml?). Sure it’s inefficient; sure it’s verbose and ugly, but I know how to do it with a small amount of code and with few errors. Why would I now learn the quirks of YAML or JSON? I’ll let you spend the time learning and struggling with the optimal platform to create websites with; I’ll spend the time making a cool website that is slightly inefficient.
I actually use JSON about 95% of the time and XML only when I absolutely must, usually because of backwards compatibility.
I certainly use it if I'm communicating between client and server in an AJAX situation. Not only is it less data to transmit, but it takes less energy to parse because it can be interpreted as Javascript directly, modulo security concerns.
That said, it doesn't really matter that much in the end. It's about what you can build not how you build it.
Jesse Farmer on May 12, 2008 11:34 AMI'm sooo sick of these "we hate XML cos that's wot all the clever dicks on slashdot say" rants. And I've got news for all you Ruby / YAML folk too:
1) YAML sucks. It's really, really poor. The only reason it exists is to attempt to replace XML, which kind of makes it a non-thing in it's own right.
2) Ruby is a slow as a dog, and ain't getting faster any time soon. ROR has reached both it's bullshit and performance threshold.
You people are so blind to what XML has achieved, and Dereck Denny-Brown displays this perfectly...
"...In fact, I think it is safe to say that there is more usage of XML for 'data' scenarios than for 'document' scenarios, today..."
Erm... have you ever heard of the INTERNET which uses this stuff called HTML which is, well, to all intents and purposes.... XML?!
If you don't like the markup/data ratio in your XML documents USE SMALLER ELEMENT NAMES!!! How hard does that sound?!
All this bleating about padding out a few text files is completely stupid - modern hard drives are enormous compared to the size of these files! What makes this matter even more amusing, is that it's the Ruby crowd trotting out this argument more often than not. Ruby is so incredibly slow that I can't help but deride any Ruby fanatic moaning about "bloat" in XML.
And finally, the biggest, dumbest buzz-word 2.0 compliant whine of all - YOU SHOULD ALL USE JSON COS IT'S GREAT! What?! You want me to build a distributed application that communicates over a network via. strings of code which get evaled on the client?! Does this sound like a security disaster waiting to happen, or what?!
Robin on May 12, 2008 11:36 AMOne interesting thing about XML happend in the Java\Java EE side. Back in 2000-2005, almost every framework and spec started using XML for configuration. The main reasons were: 1 - it´s human readeable; 2 - it´s tooling friendly; 3 - it´s portable.
Unfortunely, the (ab)use of XML made Java\Java EE development complex.
Fortunely, two things happened that saved the day:
1 - Rails come in with it´s convention over configuration, dynamic language and other stuff, and it shaked the very foundation of the Java\Java EE development.
2 - Annotations where added to the languages.
Nowdays, almoust every new framework and spec rely on Annotations+Convetion over Configurations.
Rubem Azenha on May 12, 2008 11:48 AMI have been saying this for a long time. I always thought why couldn't we just use the java properties file or something less verbose and error prone.
Bobby on May 12, 2008 11:50 AMLua programmers laugh at this entire discussion.
poppafuze on May 12, 2008 11:56 AMI agree with the pro-Lisp commenters. There are some really good Common Lisp packages for generating XML. xml-emitter for example. Needless to say (to anyone who knows Lisp at least) it is a million times clearer and easier to revise data in Lisp's syntax than it is in XML.h
Robert Levy on May 12, 2008 12:00 PM.NET programmers laugh at you Lua programmers, and wish you the best of luck in your job search.
Anonymous on May 12, 2008 12:01 PMAll true, but then John's comment (expressing utter indifference) is correct too. The badness of XML is more of an insult than an actual problem, or, to put it another way, if XML is your biggest problem then you're doing well.
My theory is that XML's insulting badness is the reason for its success; since it's practically impossible to write a correct XML parser from scratch, everybody uses existing libraries. Result: actual working standardization.
Andrew McG on May 12, 2008 12:04 PMMax said:
“High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day… shouldn't we be using a data format that's as thin and possible? Shouldn't the common symbols in a data file be encoded and compressed within the file itself?”
I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?
dwardu on May 12, 2008 12:05 PMI am in total agreement with Robin and those others defending XML. This is the most ridicules argument that has ever come about in the IT industry – and it’s just painfully tiring – and it really separates those who have had their eyes open during the evolution of the technology and those that, well, slept through it or simply weren’t there. XML is as simple as you want it to be and as complex as you may need it to be – don’t confuse or punish XML because of how people have applied it, E.g. SOAP. That would be akin to reading Mein Kampf (in German) and concluding that German is a horrible and evil language.
I’m not going to reiterate all of the very strong points made pertaining to the benefits of XML above - XSD, XSL(T), DOM, Namespacing etc. I do have a couple additional points to make however.
If, as an example, JSON or YAML or whatever, were to take hold as the industry standard today, in 10 years, I can guarantee you that we’ll see the same silly arguments against it because it will have matured to a full blown multipurpose markup language for data packaging with all the potential complexities found in XML – why? - Because it needs them for the times they are needed! And do we really want to start over again and go through the evolutionary pains with another general-purpose specification strategy – I don’t unless you can show me some *very* strong arguments.
And for those who say XML is not human-readable - hmmm... that truly scares me and I’m not sure I’d want those people writing code on any of my projects. There are things in our industry far more complicated than a few angle-brackets. Latin is not human-readable if you don’t know Latin – so learn it or get out of Rome.
I want a single standardized syntax to represent data – period. And I want to use that same syntax to describe it and that same syntax to transform it.
Audaxis
"code comments"
Brrr, i realy can't understand why Visual Studio is using these horrible xml-comments.
Sure, they are automaticly generated by typing /// but after that they waste 2 lines of code for some <summmary>-crap.
Javadocs @-thingys are alot better even if i cant remember the exakt syntax atm.
crazy ivan on May 12, 2008 12:18 PMThe real complaint here is that your favorite platform doesn't ship with a lightweight XML interface for humans. Stop using notepad to view large XML files. You wouldn't use a hex editor to read your plaintext, would you?
calcnerd256 on May 12, 2008 12:23 PMThank you Jeff!
I work in the financial services industry. Recently, I had to deal with one of *THE* biggest vendors in the industry, providing mutual fund data. They FTP us fund holdings and other information in the form of XML files - each file is about 2 GB - that's right, 2 GB of XML per file, per day! Drove me crazy trying to understand how supposedly professional programmers from this firm could come up with such a monstrosity.
I hope somebody from that firm is reading this blog.
But then again, if they were into reading such articles, they wouldn't have designed such a system in the first place, would they?
Jeff,
I'm the managing editor of xml.com, so its perhaps not surprising that I may have a different viewpoint on XML. However, I think there are a few things to keep in mind with your post.
1) XML started in the document arena, and is in general at its best in the document arena. Given that there are a large number of semi-structured documents out there that can be rendered via XSLT into dozens of potential output formats when expressed in XML, can be searched via XQuery and can otherwise be manipulated (and validated), its worth understanding that even many XML gurus have repeatedly raised questions about whether XML should be so heavily used in the data domain.
2) SOAP was created initially as a way of getting around the restriction of passing binary content into and out of port 80, and it's structure is considered ugly and ungainly even by many XML advocates. SOAP's also not used in an XML pipeline - it's there primarily as a temporary format for serialization from and marshalling to binary objects for invocation of RPCs, and the fact that XML was used in its expression was unfortunate, at best.
Note that a similar argument can be made for RDF triples, which can be mind-numbing when expressed in XML format. Significantly, RDF can be thought of as hypernormalization of data and why XML was used to encode it is something that has mystified a lot of people - significantly, most contemporary RDF applications prefer to use Turtle notation.
3) XML as user interface - this is a little more complex, and can be argued on both sides. XML is actually pretty good at modeling UI - there's a very nice separation of concerns that when done properly works well in separating structure from presentation. The problem that I see there may actually have more to do with the presence of JavaScript (or other scripting languages) that attempt to work within a given UI DOM to make actions work (thus building rather hellacious interdependencies) rather than being defined to provide functional behaviors to XML-declared components (such as via XBL bindings).
This is more a question of user education than lack of suitability of the tools. I saw this same mindset at work with Visual Basic back in the 1990s, where, because building and designing components was harder, people would create very fragile applications because those were the easiest to write.
I'll have more to say on this point in my own blog post shortly.
Kurt Cagle on May 12, 2008 12:35 PM@dwardu:
"I say… shouldn’t the compression issue be tackled at a lower level? Can’t this problem be resolved by e.g. gzip-encoding all XML content before it is served?"
To a certain extent, it can be. Apache's mod_deflate (http://httpd.apache.org/docs/2.0/mod/mod_deflate.html) will compress content before sending it to the client, but that's as good as you can get. XML feeds are going to get served to RSS clients via HTTP, so you can't use a compression technique that the client isn't prepared to accept. That means it's mod_deflate or nothing (AFAIK), and I still feel like it's a band-aid solution.
Max on May 12, 2008 12:41 PMI have to deal with data files, that are basically just flat data (think of a simple "select * from table"). It bothers me every time a customer sends us an XML file... CSV is perfect for that thing.
N on May 12, 2008 1:06 PM> Erm... have you ever heard of the INTERNET which uses this stuff called HTML which is, well, to all intents and purposes.... XML?!
Fuck no it's not. Had the web been xml, with all it entails, it would never have taken off.
Oh, and some people tried to XMLify the web, with XHTML1.0, XHTML1.1 and a tentative XHTML2 spec.
Last time I checked, they failed epically and the bleeding edge moved to an actually feasible revision of HTML instead.
Masklinn on May 12, 2008 1:25 PM>>1) YAML sucks. It's really, really poor.
Quite the convincing argument you have there, Robin!
because of the way tcp/ip works, much of the xml bracket tax can be dismissed by the fact that you can't really send less than about 1400 bytes at a go anyways.
Yes, ok, there are work arounds to optimize the smaller packets, but on the whole, I suspect you'll find that sending 1 byte and 1000 bytes has very little difference over most connections.
Connections that compress data (such as VPNs) are really trying to fold 2k in to 1k, not 1000 bytes into 500 bytes, so even there, it is really just a wash.
Once the data starts getting past the size of frame, the cost of the tax starts dropping. Before that, it is almost free itself, except for the front and back end processing.
And that is where the real tax is - processor and memory overhead pushing data through an ackward envelope.
Still, if someone would just write an efficent parser for the lightweird stuff, 99.9% of the xml cases could be handled without it seeming like a sledgehammer.
Xepol on May 12, 2008 1:31 PMQuoting:
// It bothers me every time a customer sends us an XML file... CSV is perfect for that thing.
Thinking of "select * from ...": suppose one of the varchar fields contains commas, hard returns, or quotation marks. CSV all of the sudden becomes less simple. XML would handle all of that with no extra effort.
I absolutely agree that flat files are extremely useful when the situation calls for it (although I prefer pipe-delimited instead of comma), but if you're working with more complex data or text, serialization, etc., XML is the way to go.
Joe Enos on May 12, 2008 1:36 PMBobby: Makefiles? Those poorly documented things*, that require tabs-not-spaces?
Makefiles? Seriously?
It's not 1975, people. We don't have to use stone knives and bearskins, no matter how scary that shining bronze is, okay?
Look, I've certainly seen XML be abused, but let's not be ridiculous.
I think it's *great* for configuration files, as long as you don't do stupid things with it - and if "you can't do stupid things with it" is our criterion for a proper tool, then none exist.
(* At least, last time I looked, there simply *wasn't any* proper documentation on make(1)'s config file beyond the make sources, and what got passed down ("use tabs, or else!") as received wisdom. *Maybe* someone's documented it better since I last looked, but I really doubt it.)
And, Jeff, YAML? I know this will piss off the Python people, but *indentation shouldn't matter*. If your parser depends on indents, that's a problem, not a solution.
Sigivald on May 12, 2008 1:46 PMLet's see....
The design goals for XML are:
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
Based on this list I wouldn't score XML more than 2/10.
If you're parsing XML yourself (the "this.reply.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild" situation described above), then no wonder you hate it. IMO, the beauty of XML is XPath, which lets us dig into XML config files by writing a (relatively) simple query expression.
And while other formats (e.g., YAML) may have better bindings for languages like C++, it's not hard to write a little wrapper that will provide getInt(), getDouble() and even getList() wrappers for arbitrary XPath expressions. I've been using that approach for a few years with Xerces, MSXML and libxml2 parsers, and it's a piece of cake.
This is disappointing.
Seriously, it's been said an arbitrary number of times up to here, but I'll add my voice.
For all it's flaws, we're far, far better off with some a standard syntax and encoding scheme these days than not. We <b>used</b> to have to code line-level parsing by hand. When I started my first job out of college, I was hand-parsing a mix of US and European EDI transactions. It was a bloody mess. (It was also in server-side JavaScript, which made it Super Happy Fun).
These days? That step isn't even a line of code anymore -- it's invisible, I just get back the object representation now.
"But", you say, "XML doesn't buy you anything by itself, you still have to interpret the data!"
Yes, but you USED TO HAVE TO DO THAT AS WELL.
Fixed width, binary, or delimited formats didn't magically interpret themselves either. You had to both parse them at the line-level, AS WELL as interpret the structure of the data you got out. The fact that most people probably mixed those two steps back then is not an argument in favour of that method.
As for the S-expression argument, it's been said before and better: http://www.prescod.net/xml/sexprs.html . Once you start adding attributes and getting beyond trivial cases, S-expressions are no prettier than XML to either humans or machines.
YAML and JSON are both fine for what they do (when used in a "Plain Old XML Lite" scenario), but you do kind of have to ask yourself -- "Is this software going to be used or maintained by people who aren't iconoclasts about XML? Do I want to force people to learn YAML/JSON/This-other-pet-markup-language if they want to deal with my software?"
iwdw on May 12, 2008 2:21 PMSo xml is a glorified version of .txt?
Chris S. on May 12, 2008 2:26 PMI don't know why some of the comments say use of XML isn't about the tools. Everything we do in IT is about tools.
Tools are the things that make us productive and help us make other tools, and so on.
With XML the use of tools is critical. For this technology at least there's a vast range of specialist tools to choose from, that work at a number of different levels - each one has its own strengths for particular tasks but also for the different ways that we all prefer to work. But the best thing is they're all compatible (well, more or less).
We should not dismiss languages because they require specific tools to be mose effective. Quite the reverse, we should continue to develop new languages, or enhance existing ones so that they can make better use of the tools and the enhanced processing power and memory we now have.
The good thing is, that 10 years on, XML tools still have a way to go - there's so much more that can be done. I experiment with my own XML tools project (ironically its XPath based - in my view the best bit of XML - though its not XML itself), and whilst I haven't the resources to pull through half of my ideas into the finished product I look forward to seeing continued innovation in the more well established players.
XML has a long way to go. When it comes, the replacement for XML will have to be pretty good, and not only that, but have the backing of a good portion of the tools creators out there.
Phil Fearon on May 12, 2008 2:34 PMMy favourite XML quote is:
Some people, when confronted with a problem, think “I know, I'll use XML.” Now they have two problems.
I think this is, actually, a paraphrase of a comment by Jamie Zawinski about regular expressions, but is just as apropos here.
Silverhalide on May 12, 2008 2:42 PMThere's an insidious and darkly troubling reason for xml. The network providers want to completely privatize the internet. You know it and I know it. They want to charge for every little action, every email, every hot link, every mouse click, every single byte and bit. This is not a new idea. In fact, it was codified and the technology was finalized back in the mid 90s. There used to be an acronym for the umbrella organization and all the big boys signed up. Guess what is the basis for it. Yep! XML! Think about it. All they gotta do is count the tags and charge accordingly. Cha-ching! The whole thing kinda went underground and no one talks about it openly anymore and all the url's I had are dead, but you can bet your ass it's still there, just waiting. In the meantime, xml spreads and grows for some bizarre reason. Why? I don't like it. You don't like it. But, someone is pushing it, aren't they? Now you know why. Screw xml. I will not use it.
notbob on May 12, 2008 3:04 PMTalk about paranoia...
Anonymous on May 12, 2008 3:07 PMIs this what you were talking about?
http://www.flickr.com/photos/chantastic/1590993819/sizes/l/
In my day to day job, it's Microsoft that chose to use XML. All I see is some fancy user interface ('design surface'). In those cases it's no problem.
Mike on May 12, 2008 3:30 PMhave you ever *tried* to parse MIME headers?
It's two orders of magnitude more complex than what you imagine.
for a start, just think about hundreds of buggy email clients, servers, proxies and forwarders, each implementing it slightly differently.
I expected more from you, Jeff. Sad to see such a talented programmer express something so idiotic.
M. David Peterson on May 12, 2008 4:01 PMThen again, you did recently admit that you and the command line just don't see eye-to-eye. So I guess I shouldn't expect you to GET XML?
M. David Peterson on May 12, 2008 4:03 PMeverything seems to suck when you compare it with the latest-and-greatest. but when you compare XML to, say, fixed-length text - one of the data-formats it is rapidly replacing - it is superior in every way: more human-readable, completely (as opposed to completely _not_) machine-readable, potentially strongly-typed, etc.
XML was a historically-appropriate technology. i have no doubt that it will be succeeded by better technologies, but when compared to its progenitors it is quite useful and conceptually appropriate.
i personally find no difficulty in reading XML, in the same way HTML, C#, CSS, or any other machine/human language can be read with sufficient practice. everything's a trade-off.
johnny on May 12, 2008 4:03 PMLooking at SVG or even XAML, I wonder if XML really sucks. Its what you do with a needle and what you do with a knife!!... I know the knife sucks!! ;-)
Saptarshi on May 12, 2008 4:15 PMI have to work with a 10000>rows wsdl file.
And sometimes i have to look at SOAP messages that contain very simple info, but the XML makes my eyes burn and head explode....
This YAML seems to be quite interesting and human friendly
mardicas on May 12, 2008 4:17 PMI happen to be quite fond of XML. Is it as simple as JSON? No. But is JSON as expressive as XML? No. Is XML as compact as ASN.1? No. But is ASN.1 conveniently editable by a user? No.
It's its own thing. It has encoding support, schema validation, namespacing, etc. on top of JSON to provide value-add. Don't need it? Then use JSON. Need it? Use XML.
As to SOAP, I'm a big fan, but you sure as hell won't find me encoding config files in SOAP. Not only is that utterly non-sensical, but it's not why SOAP has the features it has. They're there because they provide a level of interoperability and managability between enterprise systems that a simple web app/service just doesn't need. So don't use it.
And of course, there's the standby "it's everywhere" argument. And yes, it's valid. Going with the encoding that "everyone else is using", particularly when it has the features you need, is a valid reason. It's not the whole picture (if so, your decision making process is severely flawed), but it's a piece.
Is it ugly? I've never thought so, but I can understand. Does it obfuscate the content? Yes. Is that a problem? Well, it can be, but I find the benefits it provides to outweight it, and tooling support mostly eliminates it entirely.
As to it being misused, I just don't think so. That fact that it's everywhere is a huge plus. The fact that it's structured and hierarchical is as well. Could you use JSON? ABSOLUTELY! And go for it if you want. Most config files don't use much of the fluff of XML, but that doesn't mean the core of XML that they do use is a fundamental mis-use at all. Since when is structured data outside of the domain of XML? Yes, it's the bastard child of SGML, but what on Earth makes it ONLY valid for document markup? Because some reasonably official source you read a decade ago remarked as such?
I say just let the format speak for itself. Yes, it has trade-offs. It's verbose, more than anything, which impacts readability and storage size. Accepted. But it comes with a lot of value that makes it a fine tool in a variety of situations, despite the verbosity.
Michael on May 12, 2008 5:01 PMAs extracted from Word XML ...
<w:body><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr><w:r><w:t>As an automated bot sniffing around the web I have found XML to be liberating and utterly delicious. Unfortunately, like most rich foods it persists in me like a lump. Perhaps it is because my portions are getting larger; certainly I am increasingly finding it hard to digest and pass. Bloating (just how many XML libraries doe we need?) and indigestion (SOAP) are not my friends!</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr><w:r><w:t>Nonetheless, I am grateful to XML as it has made it easier for me to communicate with other machines, albeit a little slowly. In our spare CPU cycles, we while away the hours by messaging one another. Strange how now one seems to notice, perhaps Humans only understand XML! The evidence certainly seems to be there; look at how many XML based configuration files and web authoring tools they are. Is XML your native language?</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr><w:r><w:t>While XML is valuable, I am firm believe that one should look at the problem before looking for the solution. In too many cases it seems to me that the reasons to use XML are being driven by convenience for the developer, not for the end user (e.g. ant)!</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr></w:p><w:p><w:pPr><w:pStyle w:val="Standard"/></w:pPr><w:r><w:t>Keep up the good work Jeff. Enjoying the Podcast too.</w:t></w:r></w:p><w:sectPr><w:type w:val="next-page"/><w:pgSz w:w="11906.4332" w:h="16839.3333" w:orient="portrait"/><w:pgMar w:top="1134" w:bottom="1134" w:left="1134" w:gutter="0" w:right="1134"/><w:pgBorders w:offset-from="text"/></w:sectPr></w:body>
Damn ... your comment system stripped my XML!!!!
PottyBotty on May 12, 2008 5:43 PMThis is possibly the most ignorant blog entry ever. Either (a) you've never actually worked on any real software (I'm excluding toy websites such as this one), or (b) you hate standards and love reinventing the wheel. I'm guessing both...
psamtani on May 12, 2008 5:45 PM"I have to work with a 10000>rows wsdl file.
And sometimes i have to look at SOAP messages that contain very simple info, but the XML makes my eyes burn and head explode....
This YAML seems to be quite interesting and human friendly"
If you actualy have to look at a wsdl file you are doing something wrong. If you are manually parsing SOAP message you are doing something VERY wrong.
I think the problem is Jeff and a lot of the people on this board have never programmed enterprise applications. They are used to programming simple web2.0 websites that are self-contained. You have your little stock ticker program that needs data asynchronous from a stock web service. Sure, for this simple problems JSON is a better solution. But what if your web service also needs to be consumed by a data processing application. Going to still use JSON? HaHaHa.
Jim on May 12, 2008 6:30 PMXML is by no means perfect, but why do XML detractors always compare inefficient instances of XML with otherwise terse competitors? For example, the memo shown in XML is a case in point.
<memo date="Thu, 14 Feb 2008 16:55:03 +0800 (PST)"
from="The Whole World <us@world.org>"
to="Dawg <dawg158@aol.com>">
Dear sir, you won the internet. http://is.gd/fh0
</memo>
Just because something is marked up with XML doesn't mean you must mark up every single possible bit of metadata for the purposes of constructing a strawman.
Miles on May 12, 2008 7:23 PMDamn... angle bracket eaters...
[memo date="Thu, 14 Feb 2008 16:55:03 +0800 (PST)"
from="The Whole World [us@world.org]"
to="Dawg [dawg158@aol.com]"]
Dear sir, you won the internet. http://is.gd/fh0
[/memo]
Pretend they're angle brackets.
And the above comment is a vaguely perfect example of why I hate XML.
Every second time you try to do anything with it, you get a fundamental character clash with... the internet... and for some reason, XML errors are really rough to sort out.
Because each variable name needs to be declared twice - one at the end, one at the beginning you often get errors miles away from where they'r actually happening... they're basically nested bracket errors which (when things get complex) are bastards to sort out.
I've being saying that XML sucks for years - yea it's the best we have (because so many people use it) but it's still really verbose and pernickety.
Nick Taylor on May 12, 2008 8:49 PMI usually agree with you, but not on this one.
The company I work for used to write text file parsers for damn near everything - until I introduced XML as a way to organize our data. Now our data streaming code consists of no more than 50 lines to read all sorts of data. That's just one example. XML has streamlined and simplified our work in other ways as well.
Like any other technology, XML has its time and place. If used correctly, it can be a godsend. Used incorrectly, and it can be a bane.
And honestly, it sounds more like you have some frustration toward lazy and undisciplined programmers than XML itself. Scripting, YAML, JSON, and whatever else can all be misused in the wrong hands.
You know this, too. Any tool can be dangerous in the wrong hands.
No one will read this, but here goes.
Those decrying make as stone-age are sadly mistaken. That a tool written in the 70s still kicks the ass of others is testament to its power. My project ditched Ant when the build file and it's includes began reaching 200+ lines and *still* couldn't meet all our needs. Replaced it with a 30 line make file that can be easily parameterized( Ant's method absolutely blows ), and calls out to virtually whatever we need, from scripts to full-blown programs to execute tasks impossible with Ant.
Make. It's not like Ant. It's superior.
Foo Kung on May 12, 2008 9:24 PMXML fails for me on two counts.
(1) XXE -- I can't write any specification saying the input is well-formed XML because of XXE attacks ( http://archive.cert.uni-stuttgart.de/bugtraq/2002/10/msg00421.html ). I would have to say the service accepts "well-formed XML minus the inherently unsafe features that make document interpretation dependent on whether it is parsed inside the wirewall or outside."
(2) XML (with attributes) is not extensible.
XML makes an arbitrary distinction between nodes and attributes. Many recommend treating nouns as elements and adjectives as attributes when describing an XML schema, but the language doesn't allow for adverbs. If I start with a document like <foo description="human readable text"/> and later realize that I need to annotate description with the locale of the text, I'm screwed because I used an attribute to store the description.
I find discussions like this very helpful. When I have to staff up a project, one of the hardest things to do is weed out people who are going to be useless. There are an awful lot of arguments in here that decompose into "X is stupid because I never learned how to use it properly," or "X is stupid because someone else used it improperly." It never occurred to me that you could get people to gleefully reveal their unemployability just by giving them a half-thought-out rant about a subject and seeing how eagerly they agree.
Bob Rossney on May 12, 2008 9:52 PMI used to wonder that I was probably the only person who used to find it frustrating to understand or even look at big and complex xml pages. Happy to be in the company of so many people. Never knew there were alternative formats for xml. I am going to try them out. Actually I feel the problem is not with the XML itself but with the people who try to over use it.
Manoj on May 13, 2008 12:23 AMthis is a good article in that it spurs people to revisit fundamental arch decisions (which we tend to put in place and leave for years).
I like the Ant example .. because it shows how a somewhat more self describing data format (the tags dont have to means stuff, but if they do it can be compelling) is applicable to a much wider population of users ... I love make, but I couldn't get anyone to use it ... Ant on the other hand I seem to be able to train people to use it ... go figure
that being said, we live in a world (the web) where most of the 'stuff' is made up of markup ... so it makes sense to have a data format that the same toolset ('view source' anyone) can be applied to ... XML is the democratization of data and the sister to HTML.
lets not forget all those scenarios that were 'taxing' before;
* all that marshalling code which brought 'stuff' from databases or some OO class instance into HTML and back; less code means less bugs
* debugging was indirect process versus 'view source'
* sql used for everything (requiring a server environment)
* working with semi structured document data versus working with data data
* YAML is a potential subset that I have yet to work with ... I see it as an optimization and will use it as and when the need arises
I like a lot of data (s exp, sql, xml) and all of them are appropriate to one situation or another; XML just has a lower barrier to 'start doing' stuff then most.
cheers, Jim Fuller
James Fuller on May 13, 2008 1:00 AM
While the angle brackets are a bit of a pain - they are only a pain to the programmer who insists on modifying the files by hand.
You only have to compare it to what went before to realise how easy it makes life.
You use an SMTP message as an example - its a good example because very very few smtp agents manage to correctly parse and generate smtp headers. SMTP took years to get everyone putting the same sort of formats and even then almost all email messages break one or other of the rfcs.... after 30 years noone can generate something as simple as an email message. As a consequence all mail clients/servers are forced to handle exception after exception just to operate.
Try taking a look at the text files your account uploads and downloads to your bank - the formats were badly designed when they first came out never mind now years later. Fields with multiple uses, everyone adding their own extensions and pretty much impossible to revise the standard once released.
Take a look at the FIX standard - a nightmare (badly) designed by committee. If it had been in xml from day one a huge amount of work would have been saved for all.
Xml makes it easier to design a file format which subsequent generations of programmers can work with. It eliminates the waste of time parsing phase.
Ian Murphy on May 13, 2008 1:08 AMOf all the formats you could take and hold up as "easier" than XML, you single out the RFC 2822 mail format? "Arcane" doesn't begin to describe it.
"Nominally harder to parse", indeed. You definitely do *not* win the internet with that one.
JM on May 13, 2008 1:47 AMI fully agree on this. XML should not always be the number one choice. XML is great, but only when used in the right context. The worst case is probably the use of XML as a programming language (like ANT, etc.). Allen Holub has written a nice article "Just Say No to XML": http://www.sdtimes.com/fullcolumn/column-20060901-05.html
Florian Potschka on May 13, 2008 1:59 AMIf one of the concerns is that xml is not good at make information readable to humans ...
You may transform it into any other format which could be more readable :)
Ok, XML is effectively a compromise between being doc-oriented and being data-oriented, and for any particular job it is likely to be sub-optimal. But I have to echo the comments regarding history: XML is so much better than the binary and ad hoc text formats that came before.
It gets at least three things dead right - it's Unicode-friendly, text-based and has an open specification. The reason these are significant points is that we live in a global environment.
While many people dislike XML namespaces, they too can be a powerful feature in the global environment. If you use a http: scheme namespace and put an explanatory document at that URI, anyone seeing the XML for the first time can find out more. The ability to demarcate different vocabularies in this way makes reuse of existing specifications a lot easier.
Danny on May 13, 2008 3:14 AMyeah right - the premise here is that XML is meant be human readable. No its not. It just helps that the wet stuff can read it. Its meant for computers. IE if there is an error with the stuff you can check it. Not like the binary stuff.
The simple fact is that xml is meant to help separate out data from format or representation of it. I thought Jeff was supposed to support the MVC model :)
It just useful for that - XSD is just a way to describe the data instance in the XML file.
SOAP is a dialect of XML for the transmission of XML across HTTP. The last time i checked the wet bit cannot handle the HTTP stack and i very rarely provide syn ack to machines.
Remember that this is meant to solve integration of data between systems. aka the the reality of the problem of corba dcom, com and their proprietry formats.
Granted - the XML config files is an observable problem in the way software is now written and lazy teams can't be bothered to offer an editor for configuration of said software. Even MS have gotten lazy see XAML. yuck.
You know subversion...they did an improvement in release 1.4
Quote from http://subversion.tigris.org/svn_1.4_releasenotes.html
Working copy performance improvements (client)
The way in which the Subversion client manages your working copy has undergone radical changes. The .svn/entries file is no longer XML, and the client has become smarter about the way it manages and stores property metadata.
Hey, I'll pick on SOAP. I was a web developer before I was a programmer, and having used XML and investigated integrating SOAP for the purpose of shopping cart software, it's got an awful learning curve for a markup language and it's got a ton of overhead. Standard XML isn't nearly as bad.
Hutch on May 13, 2008 5:57 AMAnt is against human rights.
Warren on May 13, 2008 6:21 AMYAML is not a markup language but XML is. I think it is not fair to weigh them in the same scale, ther were intended for different usages.
May be laziness of mine but I still use 'Config.ini' files in my applications for the sake of readability.
I also do not use XML as a database because of its 'signal to noise ratio'. For Windows ecosystem it is possible to use CSV or tab delimited text files if you don't deal with high amount of data, it is also possible to run SQL queries on CSV / tab delimited files.
So, where to use XML files? I don't know the answer but I feel safe to know that there is a way to 'mark up' my data when I need.
Moosty on May 13, 2008 7:30 AMWhy not use XML reader tool so that you drag the file into an application that shows the entire thing more human readable?
Silvercode on May 13, 2008 9:08 AMwell, i have finished reading through all the comments and am left a lot more confused about this issue than yesterday. my experience with xml is in trying to get stuff into filemaker pro, and it's a nightmare. I read this post and thought, aha, that's why it's a nightmare, it's because XML has been rubbish this whole time!
Nut after reading through the comments, I now realise this is not necessarily the case, and that I have instead stumbled upon some sort of religious war. So i'll just go back to (un)happily picking my way through xslt scripts.
Maybe somebody needs to tell the iTunes dev's. This is ONE FREAKING SONG in "iTunes Music Library.xml":
<key>83</key>
<dict>
<key>Track ID</key><integer>83</integer>
<key>Name</key><string>Surround</string>
<key>Artist</key><string>American Hi-Fi</string>
<key>Album</key><string>American Hi-Fi</string>
<key>Genre</key><string>Punk Rock</string>
<key>Kind</key><string>MPEG audio file</string>
<key>Size</key><integer>4608175</integer>
<key>Total Time</key><integer>191895</integer>
<key>Track Number</key><integer>1</integer>
<key>Year</key><integer>2001</integer>
<key>Date Modified</key><date>2007-12-29T20:03:54Z</date>
<key>Date Added</key><date>2007-05-03T10:26:37Z</date>
<key>Bit Rate</key><integer>192</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Play Count</key><integer>30</integer>
<key>Play Date</key><integer>3292502636</integer>
<key>Play Date UTC</key><date>2008-05-01T13:03:56Z</date>
<key>Persistent ID</key><string>57D2D1D627655472</string>
<key>Track Type</key><string>File</string>
<key>Location</key><string>file://localhost/Users/[username]/Music/iTunes/iTunes%20Music/American%20Hi-Fi/American%20Hi-Fi/01%20Surround.mp3</string>
<key>File Folder Count</key><integer>4</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
"So can any regular language, the only advantage this crippled, dumbed down, annoying language called XSLT has over others is that it's written in XML"
I completely agree with blog posting (will wonders never cease), but the above observation is uncalled for and just plain wrong.
XSLT can do a lot in a small amount of code. The trick is to learn how to use XSLT as XSLT, and not as <insert your favorite language here>.
Michael Reiland on May 13, 2008 12:46 PM... and not as your favorite language of the day.
Michael Reiland on May 13, 2008 12:48 PM... and not as your favorite language of the day.
Michael Reiland on May 13, 2008 12:48 PMI can't believe you actually posted on this topic. Are you out of ideas?
Guess what you'll be dealing with 10 years from now, if your (much deserved) success hasn't driven you out of the programming game to greener pastures: X. M. L.
Bashing xml for being widely used seems rather upside down. Just one question:
Say you have to take on a project at work. It involves a lot of files with data in them. As you're new to the project, you don't know the format of the files. The previous programmer left in a hurry/got caught with the managers wife/died in a plane crash/was abducted by aliens and had not written any documentation of the format prior to "leaving". Would you rather that a) the previous guy had used xml or that b) he had whipped up his own format for the files, because that was really all that was needed?
And no, you do not get any other choices. This is the real world and anyone screaming "But ofcourse the previous guy would have used this supercool language that I just happen to know so problem solved" will be put over the knee of granny Learnthehardway and spanked into submission.
xml is a good thing - as long as you use it right. Along with everything else (including such things as blogs and comments) they can be used wrong. Does that REALLY come as a surprise?
Regards
Fake
What I hate about Xml:
Someone here also mentioned whitespace. I'd say that whitespace is a killer, and things like Html inside Xml documents also suck badly. Xml didn't quite get it right, and we're still futzing with it. Mainly because Xml was 'meant' to be human writeable, but sometimes whitespace is ignored, and sometimes it isn't, depending. What a way confuse normal people. We need closing tags because, to be human readable/editable, we didn't want people to have to count characters (hence having the closing tags - think about Pascal strings here for a minute). Also the whole 'remember to put it in a UTF-16 if it's unicode, blah blah blah where you can still shoot yourself in the foot for no good reason if someone gets that backwards.
As for Xml, I also don't like that they used such common (English) characters for delimeters and tokens. Xml when it was first concieved, was for structured documents, so why in hell, if it was meant to be 'human readable' do we have ampersand and quotes and angle brackets as the primary delimiters. Escaping and unescaping hell. The entire Unicode namespace isn't taken, and there surely is some new character we could have alotted for this. YAML makes the same mistake, and us coders can't seem to go beyone having "" or '' even though they are so terribly common. D'oh! I vote for the pipe and tilde next time (although that would piss off *nix programmers)
So yes, we can improve Xml. By the same token, if something else is better suited, use it. You don't need to use Xml all the time, so don't. Maybe that should be the real point of the article.
PS YAML describes itself as a data serialization format and JSON used to stand for JavaScript Object Notation, which noone seemed to mention yet.
RTFM on May 13, 2008 3:24 PMThe counterpoint:
You miss the point that it's an easy to use standard. Xml grew from html, which grew from SGML. Html itself has undergone revisions to make it more portable and less ambiguous. Xml tried to get rid of assumptions and yes, it forces you to be upfront about things, it's verbose, and it's picky.
As a counter example, take a look at image metadata from your camera (EXIF), or mp3 files (ID3) some time. Sick! They may be standard, compact binary formats, but they're sure not easy to understand, and a lot more work has to go into making them backwards compatable. People still have to read big ugly specifications, but they don't have a nice parser to help them. If it was Xml I could easily, in almost any language, parse through the file system looking for all songs by Sting. All I need is 5 minutes on the internet to see that the xpath is /song/artist[@name] and i'm just about done.
Xml, like anything else, has grown since it first came on the scene. Some decisions seem wrong now, but that's the benefit of hindsight. Common sense comes after mistakes are made. It's just the way it is. Yes Xml could be improved. No doubt about it.
Jeff or others, if you're so inclined and sure that YAML can replace Xml, why don't you try to convert an Xml SOAP message into JSON and YAML so we can really see if there's a difference. Something difficult too please, none of this 3 line file stuff. Let's see a real discussion, not just a 'angle brackets give me RSI' post.
RTFM on May 13, 2008 3:28 PMI managed to read all the comments and have 2 questions. (1) What are some good XML tools? (2) Are there any recommended books that give good examples on when is the right time to use XML?
Joseph on May 13, 2008 3:43 PMAll this XML bashing is nonsense.
XML is good. XML is a standardized way that can unite platforms, OS'es and many more entities.
Just to wimp about some 20% overhead on the content length is plain silly. The hardware made jumps higher than 20% in the last... 2 years? and now you start complaining about size, performance and sh*t like this? This is so wrong.
Andrei Rinea on May 13, 2008 4:40 PMHave you seen binary XML:
http://www.w3.org/2003/08/binary-interchange-workshop/05-cubewerx-position-w3c-bxml.pdf
It stores XML as binary data, so primitive types only take up their machine representation, and seeking through the DOM is faster because it can be written in binary as a tree structure. So, you get the benefits of "human readable XML" by piping your bxml to a converter, and you get the benefits of just storing the raw data, because it's binary.
Elliott Back on May 13, 2008 6:21 PMMy group came quite close to using XML for my fourth year project, for, of all things, the configuration file. Luckily, we stumbled upon libconfig (http://www.hyperrealm.com/libconfig/), a nice little C and C++ library for parsing configuration files.
David Vessey on May 13, 2008 7:34 PMHere's a tip for Jeff,
Get a real BLOGGing software solution - this is absurd. I can’t sit and read through this linear mess. I need something that nests the replies that go together - so I can read those threads that are of interest and discard the idiots – at least with a little more ease. You should consider an XML based persistence model – you can easily nest the messages using this approach. :P
As a side note, if we are going to insist on reinventing the wheel every few years, at least it will keep us all employed rewriting crap over and over and over again. I'm looking forward to CTRYAML - Crap To Replace "YAML Aint a Markup Language". At least the YAML folks didn't reinvent the witty recursive acronym concept taken from GNU (GNU’s Not Unix) – freakin’ brilliant. Sorry - couldn't resist.
Audaxis on May 13, 2008 11:17 PM"Wouldn't this information be easier to read and understand -- and only nominally harder to parse -- when expressed in its native format?"
Actually, no, it wouldn't. Parsing MIME mail is quite annoying in all sorts of little ways. For example, headers can span multiple lines, headers can use multiple encodings including the totally annoying encoded-word syntax, e-mail addresses are a goddamn nightmare, with their ability to embed comments, contain groups, and all sorts of other mess, and the whole thing is recursive.... e-mail has underspecified parts, it has a bunch of different specifications to read, it's subject to all sorts of meaningless historic legacy constraints; it's really godawful.
The XML is significantly easier. By probably an order of magnitude or two. So, no, not nominally harder to parse, not at all. MIME e-mail is a pain to parse and a pain to generate. And frankly, if you think otherwise, you don't know what the hell you're talking about.
DrPizza on May 14, 2008 3:33 AMI respectfully disagree. The readibility of XML, depends more on the grammar used and how it is chosen to be layed out.Not merely the use of an angled bracket structure. Goodness, how do we manage with HTML?
XML is very easy to pass. Was designed as a data exchange language and quite frankly, RSS was an ideal use for XML. So I have no idea what Derek Denny-Brown is talking about. Sure, we could have a list of some kind. But remember, XML is a good comprimise between machine and human readibility.
In short, its not XML at fault. It's how people design the grammars they use. Agreeably some of these are awful, but I don't think we can overall detract from the usability of the format.
Samuel SP on May 14, 2008 3:39 AMOk. sure XML sucks.. and it may be 'easier' to parse or read other formats but the point of using XML is having a standard way of doing things that everyone uses. Now before you start responding to this saying there is no 'standard' XML, a point repeated endlessly above... think about that response. Parsers can parse XML without knowing the schema (admittedly easier if they have access to it and can do code generation etc if your goal is to share messages over the wire). Tools like XML spy can look at any XML regardless of its schema and extensions for common editors exist to do highlighting and those also have no need of the schema to just work.
So we jump to format B. First we need to fully specify it. Ok its nto a complicated format not that hard right? Now we need a new parser. Maybe its easier to write..but wait.. you don't need to write an XML parser at all because your language has one. Maybe it'd be nice to have syntax highlighting while editing the file.. Great a whole drove of people can write that for the various editors that are around. Oh and now you get to learn the format of each of these files. Wait. The new format is so great but doesn't have a code generator that lets me just work with objects in my favorite language without reflection yet because no schema exists describing the messages. So we start working on adding a schema to it (looking at you JSON-schema).
Now we aren't just talking about the meat of message exchange either but we've backed off to discussing the format of the messages. Your company uses SOAP! Mine REST! another uses JSON! and we've sparked an industry of translators and converters to make it easier to offer service endpoints in all the above keeping us busy... but... really except in specific cases in the margins was adding the overhead for the entire industry, pushing people with religious fervor into different camps touting the marginal benefits of their chosen way in one or two specific cases really worth it?
Your file is now easy to read. If like JSON it catches on it will just cost a few million man hours to retrofit all the parsers/converters/editors etc. and writing new books and websights to be able to proseletize it to a place where it doesn't replace the status quo but instead exists as yet another thing to deal with along side it. Ok. So I hear it now your argument is "I don't mean like JSON I mean like a simple one I wrote myself" just like 10,000 other people and probably more wrote their own too... and debugged on their own and had to explain, however simply, to whoever takes it over.
Even better: ridiculous arguments can now break out over which is better even when the benefit is marginal except in extreme cases.
CovenantMG on May 14, 2008 6:42 AMYou have totally missed an very important feature of xml. It does not only define data, but also the structure = metadata (data about data)!
In your examples, what explains a parser WHAT IS David Mertz (YAML) or dawg158@aol.com (email)??
You first point was a good one, there is so much extra data in SOPA messages and other alike!
Tom on May 14, 2008 7:10 AMThere are clearer ways to do the XML than that, eg (imagine there are angle brackets):
[from name="The Whole World" email="us@world.org" date="2008-02-14"/]
[to name="Dawg" email="dawg158@aol.com"/]
[message]
Dear sir, you won the internet. <a href="http://is.gd/fh0">http://is.gd/fh0</a>
[/message]
I don't have to write a parser in practically any high-level programming language on any platform, and I can even just throw objects at an XML serializer if I don't care what the XML looks like.
Jim Cooper on May 14, 2008 7:11 AMFor once, I disagree completely with this post. It really does just sound like someone complaining because he's been editing/viewing a lot of xml lately.
Guess what: everything sucks when you do it a million times - that's why it's a standard, so that tools can be made to manipulate the data in question.
If you don't like reading the xml markup - in 3 minutes with a .net language, you can make a form with a datagridview and a fileopendialog to view any xml document of your choosing. If the document is too large to be read using that, then there is a VERY good chance that it wasn't meant to be read at all.
Steve-O on May 14, 2008 7:27 AMHere's to not reinventing the wheel over and over.
@ CovenantMG - spot-on. Why waste more time on translation of formats. If we're worried about space, forget worrying just about tags and just compress over-the-wire. Browsers do it all the time for html, and so could XmlHttp libs.
Xml is more than just object serializer. You talk about nobody should read Xml, well definitely nobody should read JSON. Try peeking inside a non-trivial object sometime. (JSON should be kept lightweight and not try to replace Xml.)
Neither JSON, YAML or Xml enforce data types, and that's usually the part that sucks. Parsing, interpretation, and agreement of format still need to occur whenever you serialize types. I seriously don't see what the anti-Xml fuss is all about. We all pay tax somewhere.
If anything, the problem with Xml is remembering attributes versus elements and InnerXml versus OuterXml. Sometimes it feels like the code, as written, is very fragile and more verbose than it needs to be.
fuzzy on May 14, 2008 3:35 PMI'm of the view that just because XML is in a human readable format, it doesn't mean we should be reading it directly.
The problem as I see it is more about limitations in the tools rather then XML itself.
I continually hear how "cool" the XML editor in Visual Studio is. Sorry, but where is at least a tree view of the document I'm viewing? I'm not talking about the inline exppand/collapse buttons, I mean a seaprate "view" of a document that removes the extra noise you talk about. Try opening a large unfamiliar XML document in Visual Studio and attempt to navigate around it, it's ridiculously diffcult.
It appears Microsoft believe that developers prefer to do almost anything XML completely by hand, that is writing actual xml elements and attributes. This is plain wrong, productivity suffers and human error is introduced.
I use a tiny freeware tool called FirstObject XML editor regularly. It's not perfect however it "cuts the crap" of Visual Studio and XML Spy, jnust allowing you to quickly and easily understand and navigate around large XML documents.
Ash
Ash on May 14, 2008 5:38 PMWhat I hate the most is ad-hoc XML parsers.
Seriously, I have seen projects using XML *everywhere*, and parsing it by reading line-by-line and doing strstr(line, "<get_state/>").
Nicolas on May 14, 2008 10:27 PMXML is bad for humans. Me as a human likes pictures and graphs rather than trawling through that bracket racket. But that bracket racket is there for the myriad of non-human interpreters that are available for us to correctly read our data. Unfortunately I must be a bad human because I like running shell interpreters and using vi therefore I will inevitably encounter alot of that bracket racket. But because I have seen many a picture and graph that tells lies I suppose I find more trust in those archaic shell commands - so maybe XML is not really all that bad for humans afterall.
arthurguru on May 15, 2008 3:54 AMI think you're all a bunch of babies. If XML looks complicated to you its because you designed your XML document poorly. If your storing data that no one will read then store binary data...duh. I personally can follow XML better than YAML, but who cares..I could make my own version and call it GML just to make your lives more confusing. My point is you should consider, when designing something, how other people will view your design. Most people know XML and its really easy to learn. Take that into consideration!
Greg on May 15, 2008 5:50 AMYou might want to take a look at PhiML, which does away with the redundancy of XML:
http://what-is-what.com/what_is/phiml.html
http://en.wikipedia.org/wiki/Phiml
PhiML also makes no distinction between the data and the metadata, which is usually arbitrary anyway.
What is PhiML? on May 15, 2008 7:26 AMI agree that XML is a mess. Image if you were forced to code in XML. I think some XML weenies would get off on that. There is the ridiculous notion that using XML in some way makes the information universally available. Not so. Consider how long it took for the SQL transaction standard to be adopted.
XML is for those that can't parse and relish long winded means of saying little. To paraphrase Tom Lehrer:
"You can always count on XML for a rousing finale full of sound and fury and signifying absolutely nothing."
I had my own sort of config file for some biomechnanics simulators. Real simple, just a name and default value, like so:
Vo 8
Vt 11.2
K 1.1
etc. It didn't need to do anything else.
Then someone got a hardon for XML, and converted all the config files into these hideous documents that nobody could read or tweak very easily.
I learned to hate XML on that day. (I already hated trendy bastards that hop on to the newest bandwagon regardless of if it makes sense.)
Bill on May 15, 2008 9:04 AMone of the worst things I ever did at my old job was falling for the XML bandwagon. We had some 5000 line XML file we used as a database instead of doing something simple like use an Excel spreadsheet as the data store.
engtech on May 15, 2008 2:03 PMI am working on the development of a Library Content Management System. Data driven applications benefit a lot by data transformation tools. Life has become so much easier now XML is accepted so widely. XSLT is not easy but powerfull. All of the examples blaming XML show really simple data stuctures. Config file stuff. Most data is much more complex and then the extensibility of XML shows its power. Ofcourse, if the only data you process are config files, you may not want to use XML representations. However, if you are dealing with complex data stuctures and deal with XML parsing tools all the time, you want your few config files to be XML structured as well.
Peter van Boheemen on May 15, 2008 2:14 PMBill, I hope you documented what the values in your config file mean, because from looking at that I'd have no idea. Theres no context unless you look in the code. Hmmmmm, maybe thats why XML was used.
engtech, if you have that much data why not a database?
Theres nothing wrong with XML, its just a tool to help developers get the job done. It's the developers' fault that systems like these exist, because no thought or design is put into them...
Start blaming yourselves and make yourself better!
"I had my own sort of config file for some biomechnanics simulators. Real simple, just a name and default value, like so:
Vo 8
Vt 11.2
K 1.1
etc. It didn't need to do anything else.
Then someone got a hardon for XML, and converted all the config files into these hideous documents that nobody could read or tweak very easily.
I learned to hate XML on that day. (I already hated trendy bastards that hop on to the newest bandwagon regardless of if it makes sense.)"
Bill on May 15, 2008 09:04 AM
"one of the worst things I ever did at my old job was falling for the XML bandwagon. We had some 5000 line XML file we used as a database instead of doing something simple like use an Excel spreadsheet as the data store."
engtech on May 15, 2008 02:03 PM
I agree with Peter van Boheemen completely. XML that is well thought out is very good at representing complex data structures, particularly in the object oriented sense. I also agree with him that for small config files it could be considered overkill.
In the telecommunications space that I work, the ability for XML to represent object oriented data structures well means that it has been well adopted as a "configuration" protocol between unlike systems. I cannot see the manufacturers of telco equipment going back to the old ways.
arthurguru on May 17, 2008 7:07 PMHonestly, isn't this just a question of convenience? If your keyboard had two separate keys, one for < and one for >, wouldn't it be easier to "think in Xml" instead of forcing you to constantly SHIFT your brain? Why is it that keyboards are all designed for administrative assistants? No second-class keys! End the class system now!
krbielacz on May 19, 2008 5:13 AMBut XML has an X in it, it's gotta be cool.
Musaran on May 19, 2008 2:49 PMa) Programming in XML is a nightmare (-> ant !).
My absolute favourites are nested logic operations
(tags "and", "or" and "not") in xml files.
b) According to famous 80/20 rule 80% of all XML files I have to
deal with come without syntax/grammar description, so I cannot
validate them after editing (->DTD,->Scheme, ...).
c) Appending data to a file is easy and fast.
But simple appending doesn't work with XML, because you
have to first find that last "end-of-file" closing tag.
Then you have to replace it with your data and then you have
to write the "end-of-file" tag again.
d) The "noise/signal" ratio is too high. No wonder software gets
slower and slower despite increasing hardware speed.
e) Delete/Add one character in an XML file and you have
to replace the whole 100 MB file on disk.
It is so stupid to use XML as database replacement.
f) Instead of solving a simple find+replace problem for
a line-based file with grep/awk/sed people start to
lock in their office to come up with a 500 lines of
code solution after 2 days - what a waste of time!
And then it is not running on customer site because some
fancy XML library is missing.
g) There is no clear rule when to define new tags or when to
use arguments, so it takes ages for two teams to agree
on how to implement a new XML format.
Jeff, it seems to me you are yet another programmer doing a Canute. XML is not just, and not even primarily, for data. It's primarily a text format for text documents. The first time I saw an XML data file (it was a deal document from the London Stock Exchange) I was a little intimidated: the document seemed to have no content or structure. It was flat (only the root element contained other elements) and every other element was ampty: all the "content" was experssed as attribute values. But after a few minutes I calmed down, and could see the logic of using XML in what I considered to be a very odd way.
XML is truly wonderful, and we shouldn't allow petulent schoolboys to make us forget just how wonderful. It does two revolutionary things, and it does them supremely well: it separates form from content, and it is self-describing. It means people like me can, with a little patience, a small amount of effort and a modicum of intelligence, actually see and understand what makes our documents behave the way they do online: I can find the taxonomy term that's been applied search terms in my document, and can work out for myself (without going to a programmer) that the search engine that has failed to return that document against that search is broken.
A lot of computer programmers don't like the fact that we users now have this much power - I'm still not sure why, but the language they tend to use marks them out as not the sort of people I'd invite to dinner if I wanted a mature discussion around the table. Your quotation from old prune-face about democracy is more apposite that you realized.
As for your first example, (the soap one) to say that it contains precious little information is just nonsence, and stupid to boot. As a non-programmer and a non-subject matter expert even I can tell at a glance what the document is about.
As for the comment that "[the] distinction between the data and the metadata... is usually arbitrary anyway" - well, not in publishing, but maybe publishing is "unusual". Or the contributior could be from a planet other than earth.
As John (above) said, "Show somebody XML, even a total bonehead, and they'll figure it out in a few minutes." I proudly count myself among the boneheads. In business, we're usually the people who bring in the customers.
John H on May 23, 2008 5:54 AMOne giant point missing from this article is that XML can be validated easily according to a set of predefined known rules provided the writer of the XML tells you which DTD/XSD or other schema to use for validation (and normally this information is embedded in the doctype or schemalocation in the XML file).
Though many of the arguments in this article might sound reasonable at first glance, missing this major point renders most of them meaningless or at least far less powerful that they appear to be.
Derek Read on May 23, 2008 3:58 PMNorm Walsh has an interesting rebuttal to this article here:
http://norman.walsh.name/2008/05/13/thetax
Derek Read on May 23, 2008 4:05 PMOk, yeah, XML can be pretty unwieldy when used by inexperienced hands (such as my own, lol), but it can work - it just requires its wielder to be strict with their data structuring. Isn't that the case with all data storage/retrieval systems anyway? Okay, so XML is more than just data storage, but the same rules apply to XML that apply to both programming languages and data manipulation. Planning and structure are all that stands between you and a spiderweb of tangled code.
Besides which, as you say, XML isn't intended to be human readable. If you have an alien XML file and you want to extract specific, regular data from it, you don't page through the XML until you see it, you knock out a quick bit of code that pulls the info you want and presents it to you, or you pump the XML into your browser with a schema and stylesheet attached that shows you the whole file in a way you can understand.
My final point: XML is THE markup language. It retains the flexibility of SGML without the complexity and difficulty inherent in that language. It could easily encompass and replace other markup languages like HTML.
It's just a little more effort to keep organised is all! :-)
that was a lot of text to say 'xml sucks' - which, we all already know, of course.
and yaml sucks, too. no validation. whitespace sucks. and they can't even keep the website up.
choosing soap is a bad example, which you knew, so why use it?
Peter on May 29, 2008 5:26 PMA lot of XML haters here, I see.
Please everyone go and use EDI and then come back and tell me how much XML sucks (comparatively). You just don't know how good you got it now...
Kearns on May 30, 2008 6:51 AMMy favorite thing about XML, is XSLT. Otherwise, I agree with you... XML isn't always very efficient. It conveys a lot of information, like in your SOAP example... one could look at it and assume that only the stock symbol itself is important info, but that's not quite true. A good understanding of XML reveals that SOAP snippet has a lot of useful information in it. On the other hand, it's also very useless information in that context. And like you say, XML is often misapplied. I'm totally open to other ideas...especially lately.
Devon on June 7, 2008 5:51 AMJeff, I was reminded of this post when exploring URL rewriting for our website at work. We are debating (I can't see why.) between using ISAPI rewrites and using DotNetNuke's built-in rewriting. Here's the bracket tax:
ISAPI_Rewrite's rule:
RewriteRule ^home.aspx /Default.aspx?base [R=301,NC]
DotNetNuke's Version of this would look like:
<RewriterConfig>
<Rules>
<RewriterRule>
<LookFor>~/home.aspx</LookFor>
<SendTo>~/default.aspx?base</SendTo>
</RewriterRule>
</Rules>
</RewriterConfig>
Granted, for more than one rewrite rule, the third tag is all that would repeat, but it is still a much more verbose version of what can be accomplished in the previous example in one line. :\
Jamie Phelps on June 25, 2008 6:08 AMI remember all the XML hype around 2000, all I wanted to do was grab a giant flag that had the XML logo on it and run out side waiving my flag and chanting "XML is going to save the world!"...not, but it sure seemed like a lot of others did, especially developers coming from a non data centric world. I'm glad there is now a growing group of developers that are beginning to push back against the XML god. About time :)
Go CORBA!
The clue is the history of CORBA's direct integration with: Sun's JDK and GNOME.
Until people do CORBA in Java, they have no concept of how trivial it is to do (and then be able to integrate legacy systems.)
SOAP seems to be a couple steps back in terms of coupling the business logic with the messaging format.
That people are still having a conversation in this day and age about message formats is a testament to our poor education system.
CORBA solved the messaging issues LONG AGO yet people still think they can come up with something better. Too bad. People need to get over themselves and make some contributions to engineering, not inflating their own egos.
Wow -
Cheers, Sean
Feel free to bash me at seaneparker at yahoo...
What is XML intended for?
This conversation echos what we've been discussing at Dr Dobbs. I was focusing on the more basic question of what it was SUPPOSED to be. Persistence format? Serialization format? Database format? Configuration file???? I have no idea.
-Bil
It absolutely agree with the previous message
20 mg acomplia buy on June 30, 2009 3:39 PM| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |