Bill de hra recently highlighted a little experiment Ian Hickson ran in August:
I did a short study recently checking only for syntax errors in HTML documents, and the results were that of the 667416 files tested, 626575 had syntax errors. Over 93%. That's only syntax errors in the HTML, not checking the CSS, the content types, the semantic errors (e.g. duplicate IDs -- 86461 of those files had duplicated IDs), or any other errors.
![]()
If you included those kinds of errors, you'd probably find that almost all pages had errors that would trigger this warning. Thus any sort of visible UI would be basically always saying "this page is broken". That would not be good UI for the majority of users, who don't care.
Even Tim-Berners Lee, the godfather of the Web, acknowledges that the move to enforce well-formedness on the web with XHTML has failed:
Some things are clearer with hindsight of several years. It is necessary to evolve HTML incrementally. The attempt to get the world to switch to XML, including quotes around attribute values and slashes in empty tags and namespaces all at once didn't work. The large HTML-generating public did not move, largely because the browsers didn't complain. Some large communities did shift and are enjoying the fruits of well-formed systems, but not all. It is important to maintain HTML incrementally, as well as continuing a transition to well-formed world, and developing more power in that world.
Perhaps this is why there's 63 HTML validation errors on Google's homepage right now. Like it or not, we live in a world of malformed HTML. Browsers aren't compilers. They don't fail spectacularly when they encounter invalid markup. And nor should they. HTML is, and always has been, tolerant by design. We'll always be awash in a sea of tag soup.
Your browser doesn't care if your HTML is well-formed. Your users don't care if your HTML is well-formed. So why should you?
For the same reason all of my C programs will "return EXIT_SUCCESS;" at least somewhere, when "return 0;" or using "void main" (instead of "int main") will work just as well.
Using XHTML Strict is equivilent to the --pedantic-errors gcc flag, and forces people to typecast tyings to their typedef, such as time_t; where on most systems just using a long will work.
If the standard is specific enough, and if my code conforms to it, and the compiler conforms to it, then any future compilers advertising compliance should give me a binary that functions the same.
Philihp Busby on November 14, 2006 7:49 AMThe Reason why XHTML is so error proned is the Tools have never been standards based and do not do things with the standard in mind. Adobe Premere uses Font tags. Frontpage (all I need to say) Dreamweaver works ok but leaves a lot to be desired when working with scripting languages like ASP.NET and PHP
Microsoft Expressions is the first I have ever seen that cooperates with all DOCTYPE declarations. The CSS tools are omg good.
Josh on November 14, 2006 8:15 AMBut browsers do care about compliant markup, yes? Don't they switch to a faster standards-compliant mode if a valid doctype and valid code is detected?
Personally, I care because I want forward compliance and cleaner markup, and because if you make your code W3C valid, then you're already a long ways towards having the most accessible code you possibly can.
I suppose accessibility can be achieved without valid markup, but I'll bet it's a lot harder that way.
listless on November 14, 2006 8:22 AMYour browser doesn't care if your HTML is well-formed. Your users don't care if your HTML is well-formed. So why should you?
Well formed looks better with syntax coloring in the IDE- so, my IDE cares! I would go ahead and say the semantic content should be well formed, but the presentation content is always going to be hacked to the idiosyncracies of the containers.
matt m on November 14, 2006 8:45 AMAnd as for Google- it looks like a lot of their validation errors are calculated to save on bandwidth.
matt m on November 14, 2006 8:47 AMThis is largely the result of bad tools, as Josh said. As the tools improve, the standards compliance will follow.
Tim's article talked about backfitting XForms into HTML rather than keeping it in XHTML land. That may be a necessary tradeoff, but I'd think that one incentive to offer tool vendors and content producers is enhanced functionality: new features only valid in XHTML.
Jon Galloway on November 14, 2006 8:53 AM"Your browser doesn't care if your HTML is well-formed. Your users don't care if your HTML is well-formed. So why should you?"
Because it makes it easier to debug problems with your page. If you ask for help with a CSS bug on the css-discuss mailing list, the first thing people will ask is "have you validated your HTML"? This isn't pedantry: if you have invalid HTML, you're relying on the browsers (undocumented) error handling as part of the way it renders your code. If you have valid code and it's rendering incorrectly then it's down to a browser bug - and these tend to be pretty well documented by the community and hence much easier to work around.
All of that said, I prefer HTML 4.01 Strict as opposed to XHTML. It's still a standard and you can still validate against it, but it comes without any "fail-on-error" baggage.
Simon Willison on November 14, 2006 8:56 AMthat's one of the biggest challenges for browsers, and that's the same reason Internet Explorer got so popular, because it allowed a lot of malformed html to show up nicely
Eber Irigoyen on November 14, 2006 8:56 AMMaintenance, maintenance, maintenance.
Jeff McKenzie on November 14, 2006 9:06 AMIf all pages were well-formed, you'd need to write a lot less code to make a web browser.
BTW yes my domain does end with dot info and no, I'm not a spammer.
Rik Hemsley on November 14, 2006 9:08 AMJust checked my old site (about 9 years old).
About 7 years ago it passed validation with flying colors, no warnings, no errors (was proud about it, because I worked for it). Was also looking good in IE, Netscape and others, on Windows, Mac, Linux (still does).
Now the validator complains about doctype, frames, and what not.
One should really redesign a site every 3 years just because someone did some improvements somewhere, and had a feeling that frames are illegal, and doctype is mandatory?
A doctype should be mandatory. Just put in HTML4 with frameset, that'll allow your crufty old frame code to pass with flying colors.
Name on November 14, 2006 9:46 AMYou'll be proud to know that coding horror falls into the 93% block of websites.
3 errors for the homepage:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.codinghorror.com%2F&charset=%28detect+automatically%29&doctype=Inline
16 (as of now) for this page:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.codinghorror.com%2Fblog%2Farchives%2F000723.html&charset=%28detect+automatically%29&doctype=Inline
David
David on November 14, 2006 9:47 AM> BTW yes my domain does end with dot info and no, I'm not a spammer.
You really ought to move out of that bad neighborhood. Gentrification could take decades.
http://chris.pirillo.com/2006/08/17/info-domains-are-dead/
> As the tools improve, the standards compliance will follow.
I disagree. The tools are irrelevant (except to developers); it's the renderers that matter. But if well-formedness makes your life easier as a developer, then that's a valid reason to go that route. It reminds me of the static vs. dynamic typing argument, really. Which one is correct? Both, depending on what you're doing.
Jeff Atwood on November 14, 2006 9:47 AMHmm, yeah I'd like to be compliant, and things like in the new VS2005 where at least it tells you what you shouldn't do it (even if it can't always tell you what you should do instead).
However, part of my problem is that, for one reason or another, IE6 renders incorrectly when I correct old mistakes and make it xhtml compliant, sigh.
Yeah IE6 is horrible, but we all don't have to luxury of picking and choosing what our users use, and I don't get paid for rewriting an entire website that works 'just fine' already.
"Your browser doesn't care if your HTML is well-formed. Your users don't care if your HTML is well-formed. So why should you?"
I don't
Mako on November 14, 2006 10:17 AMI'm sorry... this blog thinging chokes on greater than signs... One last try
When I daydream, I often travel back in time. To buy Netscape stock before it set records? No. I go back to a time before CSS and before browsers were more than just a CSCI project. I go back to correct the horrible, horrible mistakes made by those who created the XHTML/CSS standards, which is to blame for the situation that we now toil under.
I want to flesh this out into a proper blog entry, but here are some points that I think are relevant to this discussion:
1: Making a language forgiving does not make it easier. In fact, it almost always has the opposite result. Example: in old VB, what is the result of 1 + "3.5"? It's not immediately obvious (is it "13.5" or 4.5?). Is it really harder to write 1 + toInt("3.5")? Forgiveness necessitates ambiguity. Errors happen in front of your audience, which is using a browser that you don't have and that came to different conclusions about your ambiguous code.
2: Strict enforcement of the language does not necessarily mean hard. The trouble with what is out there is that we lack a sufficiently complete CSS standard. Powerful tools could have been provided and without confusing ambiguity. Cases in point:
CSS Conditionals:
It would be great to be able to ask if a feature is supported or which browser you are working with within CSS and then implement code accordingly. Why not this kind of statement:
min-width? (min-width:40%;) | (width: [Microsoft garbage here];)
The '?' would basically have asked, "Is it supported?". How much simpler is this compared to what we have to do now. Just this one feature alone would suck much of the horrible complexity that we deal with today. And asking “which browser/version” would cover bad implementations as well. For cases like this:
browser("Explorer", (less than sign) 7)?(background("Crapy.gif");) | (background("Pretty.png"};)
Or how about the ability to do this:
margin-top: =(SomeIDName(width) – 2px);
...or...
width: (80% - 2px);
Features like the above examples would fundamentally change the way the XHTML is designed. It would remove most of the empty DIV tags that we’re forced to pollute our beautiful code with.
Most, except for when we have to make a rounded box. Enough has been written on that subject. I don’t have to hammer out fantasy pseudo code, which would just serve to depress.
The main point is that XHTML has ALMOST enough in it to be completely fine (add br/ back in and I’m happy). CSS, which is designed for graphic designers, does not have enough positioning and conditional tools to solve most of our problems without depending on browser bugs and redundant and ugly code. Demanding quotes, closed end tags and unique ID's makes it much easier for design tools and browser to interpret the code easily. To be lazy with these details solves nothing.
Another reason to try to get as close to well-formed or even valid as possible (no matter what your DTD) is predicatability.
These days all the browser developers follow the W3C specs to a good degree. There is predictable behavior here for a given input.
But each browser developer necessarily handles malformed or invalid code differently. The W3C specs don't tell developers what to do with malformed stuff, so each browser pretty much does its own thing. There is some consistency here after all these years, but it's mostly coincidence.
Therefore, it's in people's interests, mostly for their own sanity's sake, to follow the W3C specs to the extent that they are able.
This is all more critical for complex designs that use a lot of CSS and JavaScript than it is for simpler sites, which can usually get by with less rigorous formedness. So, for people aspiring to do complex or intricate designs, they need to care; people who have more modest aims don't need to. This to me is a pretty good state of affairs, even though the whole story is a little ugly and overcomplicated.
John on November 14, 2006 10:23 AMI'm not a web developer nor designer. That probably explains why I feel concerned about valid HTML: it's a very "young" notion in the web world. Web designers with experience have passed through many iterations of HTML. What they have learned years ago still work.
The real reason why there are so many invalid pages (in the eyes of the new specifications) is because HTML started plain badly (in the eyes of those who make up specifications) and changed a lot. Moreover, new specifications often make obsolete constructs in the previous ones. Imagine if evolution of ANSI C/C++ would have come with the same ratio of "breaking" changes. Thank God, it mostly added stuff.
But in the end, since browsers and users don't care if your HTML is malformed, it means this debate is about maintainability. Valid HTML is a coding standard, nothing more.
Martin Plante on November 14, 2006 10:39 AM> Your users don't care if your HTML is well-formed. So why should you?
For myself.
* For my pride, because I see myself as a craftman, I consider that producing valid, well-formed markup is part of my craft, part of not just doing my job, but doing my job well.
* For my sanity, because malformed markup's rendering will depend on the way the HTML parser "patches" the malformed HTML, well-formed HTML allows me to just _know_ how my HTML will be parsed.
* For my productivity, because of the previous point well-formed markup makes scripting much easier, and cleaner. And it's even cleaner if the HTML is semantic as well.
The only thing I "allow" my pages to report are non-standard attributes, because i sometimes use them to enhance my documents for scripting purposes (custom/non-standard attributes are much more flexible than just having classes available, and much more powerful).
And to end this post, browsers may care about the validity of your markup: just try to feed true XHTML to Firefox (including the application/xhtml+xml markup) and see what he does when the markup is invalid.
> These days all the browser developers follow the W3C specs to a good degree.
Uh... no.
Masklinn on November 14, 2006 10:41 AMHTML and forgiving browsers are part of the age of no flowcharts, no specs, no thought for memory management, and little concern for scalabiltiy (other than to verify it runs using the Northwind database in Access).
I have to agree that the biggest reason to go for validation is to be able to debug.
Nothing bugs me more when I'm writing a Greasemonkey script and the site I'm modifying already has tons of javascript errors. Same thing with XHTML. One of your readers tells you that things are going wacky in IE6 because of your latest post, it's going to be harder to track down if you don't already validate.
engtech on November 14, 2006 11:39 AMIn terms of handling basic well-formed HTML according to the specs, the browser developers do indeed follow the W3C specs to a *good* degree, although they're by no means perfect yet. IE 6 and earlier has some annoying CSS bugs, but many are fixed in IE 7; nonetheless IE has done HTML well for years, and was the first browser with decent CSS support. The Gecko, KHTML and Opera rendering engines all do tremendously well against W3C specs.
Browsers that support less-than-perfect code lowered the barrier to entry for people of all skill levels to participate in the Web since it started. Why shouldn't browsers continue to be forgiving for those people? Why make the Web more brittle than it needs to be? Invalid and ill-formed pages has allowed people to Get Things Done and get them on the Web, easily, cheaply and quickly. Doing it *right* is something of another matter, but that's possible too. Best of both worlds.
John on November 14, 2006 11:49 AMIs Jeff being serious here? Isn’t this a little like saying “Your compiler doesn’t care if your code is well designed. Your users don’t care if your code is well designed. So why should you?” The answer is simple: readability, writability, and maintainability.
When writing code, to take the easy road is to just write code, web pages, etc. that “just work.” The compiler only cares if the syntax and typing (if the language is typed) are correct. But, in the long run this only makes the reading, writing, and maintenance of code much more difficult. Indeed, one of the major arguments in favor of (static) typing in programming languages is that it enforces certain standards in code. Why is it that that we make fun of “bad” code posted to the Daily WTF (much of the code posted there does work after all)? Why is it that most serious discussions on programming focus on good design, not merely getting software to work? Well designed code is easier to write, read, and maintain. Well designed code is less likely to break tomorrow when something is changed today. Think about all of the major advances in programming over the past 30 years. Most of those advances were advances because they made the reading, writing, and maintenance of code much easier. As far as the user and computer are concerned, there isn’t anything that we can do now that couldn’t be done 30 years ago (programming languages were just as Turing complete then as they are now). But, as far as we, the programmers, are concerned, there is much more we can do now that we couldn’t do 30 years ago.
My point in making the analogy (if you want to call it such) between HTML and programs is that a lot of the reasons for being concerned with proper HTML and CSS (indeed the reason for creating the standards of HTML and CSS in the first place) is to make it possible for the web developer to create writable, readable, and maintainable web pages. Indeed, your users don’t care, but that doesn’t mean that you shouldn’t.
Ben on November 14, 2006 11:50 AM> between HTML and programs
Is markup exactly the same as code? I don't think so. Along those same lines, consider the role of the compiler in dynamically typed languages. It's far less useful, because it can't tell what you're trying to do.
> But, in the long run this only makes the reading, writing, and maintenance of code much more difficult.
And yet plenty of XHTML validation rules cause more pain:
http://codinginparadise.org/weblog/2005/08/xhtml-considered-harmful.html
Jeff Atwood on November 14, 2006 12:24 PMWith few exceptions, I'm not convinced there's much value in using or migrating to xhtml at this point. In fact, if you're not serving your content up as application/xhtml+xml or text/xml, there's no reason at all to bother with xhtml and plenty of reasons why it's a bad idea (http://hixie.ch/advocacy/xhtml). The fact that IE doesn't even support xhtml is not the least of these problems. This isn't just about closing your tags and quoting attributes: core javascript methods like document.write and properties like innerHTML can't be used either. This alone makes switching to xhtml impossible for 99% of ad-driven sites.
The programmer in me loves the the idea of cranking out perfectly valid xhtml and writing javascript that only uses the DOM Core methods. For the same reason, I try to write html that's semantic and avoid <div> soup and use CSS for presentation. But most people authoring web content are not programers and the low barrier for entry is partly why the web is what it is today. The fact is, html *needs* to be highly fault tolerant. I agree with those who have advocated valid markup for readability/maintainability purposes, but let's be honest: we're the exception, not the norm. Table-based layouts are only just starting to become obsolete (if there's a bigger readability killer than that, I'd like to see it). And look at google's web authoring stats: http://code.google.com/webstats/2005-12/classes.html. The 13th most popular class name is "mso:normal". When that many people are using MS Word to author documents for the web, it becomes abundantly clear how far we have to go to migrate to xhtml.
James Stevenson on November 14, 2006 12:50 PMOoops, looks like someone beat to all this:
>And yet plenty of XHTML validation rules cause more pain:
>http://codinginparadise.org/weblog/2005/08/xhtml-considered-harmful.html
I think most people don't care about using valid XHTML because it does nothing to guarantee browser compatibility. Just because my site validates doesn't mean it will display the same across browsers. Until this is the case, I don't see many developers caring about being valid. They just want their site to look right in the major browsers.
Brandon Wood on November 14, 2006 1:21 PMIt is laborous to do things. It is even more laborous to do difficult things like standards. So, if some parties have achieved some standards that are adobted into world wide use, the parties have obviously done something really hard and laborous.
Well, nothing comes easy, except things that are not perfected for ever. So the results are of course what they are and that is less than perfect. We can forgive our heroic developers, because also time is money.
I think that when standards are made, some open source helping hands could be taking some of the development burden. Like Firefox is a good browser, so there could be some good open source standards.
And if the world is a mess, then if you provide some clean and neat service, you get the customers.
If something is allowed, then that is a feature. If you didn't mean to allow it to happen but allowed anyway, its too late - you have a feature. Lets consider a game like World of Warcraft. There is a glitch that allows snoopy players to go behind the scenes. Is this ok? Well, it does not matter. Some players go behind the scenes anyway. You could fix this so that the users are not allowed to go behind the scenes, but would that be ok? It does not matter. The users just could not abuse the glitch anymore.
Would it be easier for the users to be in the right side or go through glitches? It depends, but it would really be helpful to know what is the right side. And it would really be helpful if the scenes were intact without lots of holes and traps where to fall through if you didn't want to.
Don on November 14, 2006 1:59 PMBy the way, Google's homepage is up to 79 errors now. Don't ask me. I did notice that the URL changed to www.google.co.jp. Hmm. That seems odd to me, but it must be a DNS/load balance thing. The validator said 49, but the list counted out to 79. Why didn't the validator see the other 30 errors? Is this an XHTML/XML/CSS conspiracy?
Oh, well.
I firmly believe in well-formedness, and get FRUSTRATED when VS 2003 reformats my markup to its ingrained Frontpage/Visual Interdev derived crap. At least you can turn off reformatting (for the most part) in VS 2005. I've still run into some autocomplete issues in VS 2005. One thing I cannot do without is Intellisense and autocomplete.
Again - Oh, well.
At least it doesn't mess with my code-behind too much.
John Baughman on November 14, 2006 2:44 PMOk, I didn't account for info and wrning messages. My mistake. It's 49.
John Baughman on November 14, 2006 2:52 PMKeep in mind that the HTML validator at w3 is itself broken. I find it reporting perfectly valid HTML as broken because it can't tell the difference between tags in script from tags in HTML, and so on. Not sure I believe this statistic until the validator itself is valid.
Tim B-L knows the validator is broken, too. :)
Brook Monroe on November 14, 2006 4:30 PM"The Reason why XHTML is so error proned is the Tools have never been standards based and do not do things with the standard in mind."
This is true, but it's not the only reason. The entire publication chain has to be XHTML valid-aware. Also to get valid XHTML means pushing back hard on authors at all points in their editing flow, or using highly constrained interfaces - my experience is that authors will put up with neither. And we haven't even talked about accessible markup, or encoding. This is not a simple problem.
Bill de hOra on November 14, 2006 5:09 PMTo try to keep this brief, I'll just rephrase the question slightly: "The compiler doesn't care if you use spaghetti logic, the customer doesn't care if you use spaghetti logic, so why should you?"
Or: "The compiler doesn't care if it emits 5000 warnings during build, the customer doesn't care about warnings emitted during build, so why should you?"
I believe these are equivalent because warnings and spaghetti logic do not preclude the app from working properly.
Writing standards-compliant HTML makes that code more predictable in layout and organization. For example, deleting a div tag will not delete an opening tag and leave the ending tag, because tags must be cleanly nested in standards-compliant HTML. The browser may not care about the orphaned tag, but when something unrelated breaks on that page, that tag is going to be a red herring that wastes my time.
In addition, properly nesting the tags allows me to use a tool such as tidy to quickly bring the layout (i.e. indenting) of the file into agreement with it's logical organization. Which, as the saying goes, is a very good thing.
The major browsers all have bifurcated rendering pipelines: 'quirks mode' and 'standards mode'. Invalid markup get rendered in quirks mode, and all of the major browsers handle that slightly differently -- which can lead to discrepancies in the most unexpected places. Standards mode is much more consistent across browsers. So, assuming your goal is a web app that behaves and looks consistent across the major browsers, standards compliant HTML will move the starting line closer to the finish line.
Standards compliant HTML is a lot like a good keyboard -- you can't tell the developer used it by looking at the finished product, but it made the developer's life a lot easier nonetheless.
scotfl on November 14, 2006 5:12 PMWhat Simon Willison said.
People on teh interweb will cite you lots of reasons for writing standards compliant markup. Almost all of those reasons are bullshit. Aside from satisfying your personal sense of aesthetics, the one *practical* reason to validate is that creating any web page means wandering through a VAST space of cross-browser rendering errors. Validating your markup shrinks that space dramatically. That's all.
Evan Goer on November 14, 2006 10:23 PMMy organization DOES care because our customers demand their sites to be wholly accessible. We have to comply to 22 guidelines, the first of them being complete XHTML strict conformance. ;-)
That said, if you understand french you should really read this article, which is a "J'accuse" about W3C...
http://www.uzine.net/article1979.html
MaxL on November 15, 2006 12:13 AMYeah yours falls into the 94% category. Its a shame people can't write clean code, its not like its hard to write clean XHTML code even when its dynamically generated.
SomeGuySomewhere on November 15, 2006 3:39 AMTagsoup is a great evil. It makes writing tools to extract information from "HTML" pages a real pain in the ass. It confuses content-creating end users. It bloats browser software with incomprehensible work-arounds to problems that shouldn't exist in the first place. It makes writing browsers for small or embedded systems unpleasant, and limits their usefulness in the wild.
Anonymous Cowherd on November 15, 2006 4:10 AMYou have 48 errors on your page...
maurizi0 on November 15, 2006 5:03 AMDo your users care if your site is indexed by Google? Do you care?
Malformed HTML works more or less consistently in major browsers, but only because the browser vendors have spent considerable effort to reverse-engineer each others error handling. They have to, because otherwise users complain.
Don't expect that the various parsers used by spiders and search engines implement exactly the same (extremely complicated and never specified) error handling logic as your favorite browser.
If you forget quotes around the url, a search engine might not traverse the link. If you forget to close a quote, the contents of the rest of the document might not be indexed. You can't really be sure. Why take the risk?
If your HTML is well-formed and valid, at least you have one less thing to worry about.
Olav on November 15, 2006 7:12 AMThe problem with well-formedness and valid markup is that some errors will actually trigger quirks mode or even a rendering error, but many of the validation errors seem to me to be nit-picks. At some point you end up putting in extra hours just to please the validator where the changes have no practical effect on the rendering of your code.
Sal on November 15, 2006 9:10 AMThe same reason static typing is great. Fail early.
David House on November 15, 2006 9:40 AMI agree with Olav. And I disagree with Jeff's statement about the tools not needing to be better.
To me the heart of the matter is this: how accessible is my content? I do not think it matters much if you use XHTML 1.0 versus HTML 4: they're the same freaking semantics. What matters to me is that we craft projects that follow the (albeit misquoted here) axim "be liberal in what you accept and conservative in what you produce."
There should be zero reason that any new undertaking should produce invalid markup. The reasons of parseability and maintainablility should be satisfactory to pretty much anyone who creates something new. In a business sense it doesn't matter if your markup is valid or not because that doesn't translate directly into gain or loss. But it should (and secretly does) due to the idea of unforseen circumstances.
Take for example microformats. Using microformats allows for some enterprising people (like Technorati) to come up with new services that scour the web and understand microformatted materials. Valid markup is analogous to this.
Valid markup makes your content that much easier to parse and therefore gain from other services and tools other than the ones you thought of when you first created your product/tool/whatever your content is.
That's potentially huge. Web browsers are the most forgiving renderers on the planet and therefore a poor choice of target. I believe that the world benefits from having any public content as publically accessible as possible, and validating our markup against a standard is the first step in accomplishing this.
Tools are also to blame for not being more friendly to the world. Microsoft Word and Frontpage are especially sad examples of this behavior (though the last version of Frontpage got much better and Word's sometimes hard to find export to compact HTML filter was good too). Any content created in Word or Frontpage is likely to be only readable in a web browser. If you write a parser that comes across Word- or Frontpage-created content, you're in for a heckuva lot of work.
I hope one would care if Word made a document more difficult to understand by introducing unneeded sentences (or even worse: sentence fragments) so why shouldn't a developer care if a tool she or he uses/creates outputs difficult to understand markup? Laziness only goes so far.
Josh Peters on November 15, 2006 11:25 PMI agree that well formness can be quite cumbersome and the immediate benefices are often negligable, at best.
But he XHTML Transitional wasn't meant to be fully XML since it's a "Transitional" state. XHTML 1.0 Strict document will not show up if you neglect a slash in a short tag or use the wrong entity. But there is no good reason why you would want to build a site in XHTML 1.0 Strict right now since Microsoft Internet Explorer does NOT threat any XHTML document as XML and wont untill IE8.
XHTML Transitional was meant only to ease the transition of html to xml
Maxime Haineault on November 16, 2006 5:35 AMMaxime: Actually the "Transitional" doctype was meant for the transition from presentational markup to semantic markup + CSS, not from HTML to XML.
XHTML Transitional is just as much XML as XHTML Strict, and they are equally unforgiving with regard to syntactic errors. The difference is that the Transitional doctype allows a number of presentational elements and attributes which is illegal in Strict, like font, align, bgcolor and so on.
The low-level syntax (XML or HTML) is independent from the doctype (Strict, Transitional or Frameset).
Olav on November 16, 2006 6:38 AMHonestly, I find that with PHP on my back-end doing a lot of the code generation for me, it's easier to make my templates conform, since I can throw something on the order of three or four PHP commands into a single document and have it all pull from a common source.
I'm not saying my code is airtight, but using a combination of NVU, which seems to pride itself on XHTML standardization and simple PHP coding, I seem to get by. Its when I use pre-formed scripts that I run into problems where my code formatting is malformed.
Rune on November 16, 2006 7:38 AMI don't care if a page I am viewing validates or not, but if I have to work with a page, I much prefer if it validates.
I've done bits and pieces of web design in my time, and 90% of the display errors I've experienced have been resolved simply by validating.
I was lucky, I only learned HTML 3 years ago, so I missed the 90s and the frames, and the like. But when I sat down to learn HTML I decided to learn it correctly. Why wouldn't you. It is in my honest opinion as difficult to work with bad html as it is to write good html.
Des Traynor on November 16, 2006 9:38 AM"Dans les cabarets de Berlin, un couple d’hacktivistes hermaphrodites milite pour un retour au HTML proltarien."
brilliant..
Xhtml is a good aim, but... when you are maintaining a trillion lines of HTML/ASP/C#/Javascript code (one day I will do the count..) it becomes a painful/expensive/useless aim.
If you need to transfer data between A & B, you can either use a web service or generate pure XML on a regular dynamic page.
What a lot people keep forgetting is that XML was intended as a data container for it's transport between systems. XHTML is another of those "one solution for all your needs", brilliant if you get full-on support/implementation in your company but useless if you try to go halfway.
Not for me, at least for now.
argatxa on November 16, 2006 12:01 PMXHTML's problem is the same as HTML's problem: Shoehorning wildly different content into a small set of tags, and disallowing any extensibility (even if it's fully-defined in CSS). Sometimes you need line breaks, sometimes you need footnotes, sometimes you need musical notes. Docbook and the other specialized sub-languages are better for the semantics crowd. Meanwhile W3C wants you to use their handful of random, semi-useful tags, and an enormous sea of divs and spans for everything else.
Only webheads can put up with that for long. Everyone else just wants it to look a certain way and usually doesn't care about how it ends up doing so.
Foxyshadis on November 21, 2006 12:33 AMIt's coding horror :)
numero on November 24, 2006 10:08 AMExactly, it's coding horror :)
Though I don't mind malformed HTML, but tag soup is unmaintainable. We should have middle ground somewhere between 'clean code' and 'gold plated code'.
mm on July 21, 2008 8:36 AMAs some people have suggested, you should use the W3C validator like a lint tool or like compiler warnings. It can help you find typos - for instance I've just used it 5 minutes ago to find some mistyped id values in <label> for attributes in a colleague's work that I'm updating.
If you understand the validator's warnings & you're happy to ignore them, that might be fine. But I find it easier to check that there are no warnings than to skim through half-a-dozen to see whether any of them matter.
MarkJ on November 4, 2008 3:18 AM| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |