I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


5 posts from December 2009

December 29, 2009

Responsible Open Source Code Parenting

I'm a big fan of John Gruber's Markdown. When it comes to humane markup languages for the web, I don't think anyone's quite nailed it like Mr. Gruber. His philosophy was clear from the outset:

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

If you're an ASCII-head of any kind, you will feel immediately at home in Markdown. It was so obviously designed by someone who has done a lot of writing online, as it apes common plaintext conventions that we've collectively been using for decades now. It's certainly far more intuitive than the alternatives I've researched.

With a year and a half of real world Markdown experience under our belts on Stack Overflow, we've been quite happy. I'd say that Markdown is the worst form of markup except for all the other forms of markup that I've tried. Of course, tastes vary, and there are plenty of viable alternatives, but I'd promote Markdown without hesitation as one of the best -- if not the best -- humane markup options out there.

Not that Markdown is perfect, mind you. After exposing it to a large audience, both Stack Overflow and GitHub independently discovered that Markdown had three particular characteristics that confused a lot of users:

  1. URLs are never hyperlinked without placing them in some kind of explicit markup.
  2. The underscore [_] can be used to delimit bold and italic, but also works for intra-word emphasis. While a typical use like "_italic_" is clear, there are disturbing and unexpected side effects in phrases such as "some_file_name" and "file_one and file_two".
  3. It is paragraph and not line oriented. Returns are not automatically converted to linebreaks. Instead, paragraphs are detected as one or more consecutive lines of text, separated by one or more blank lines.

Items #1 and #2 are so fundamental and universal that I think they deserve to be changed in the Markdown specification itself. There was so much confusion around unexpected intra-word emphasis and the failure to auto-hyperlink URLs that we changed these Markdown rules before we even came out of private beta. Item #3, the conversion of returns to linebreaks, is somewhat more debatable. I'm on the fence on that one, but I do believe it's significant enough to warrant an explicit choice either way. It should be a standard configurable option in every Markdown implementation that you can switch on or off depending on the intended audience.

Markdown was originally introduced in 2004, and since then it has gained quite a bit of traction on the web. I mean, it's no MediaWiki (thank God), but it's in active use on a bunch of websites, some of them quite popular. And for such a popular form of markup, it's a bit odd that the last official update to the specification and reference implementation was in late 2004.

Which leads me to the biggest problem with Markdown: John Gruber.

I don't mean this as a personal criticism. John's a fantastic writer and Markdown has a (mostly) solid specification, with a strong vision statement. But the fact that there has been no improvement whatsoever to the specification or reference implementation for five years is … kind of a problem.

There are some fairly severe bugs in that now-ancient 2004 Markdown 1.0.1 Perl implementation. Bugs that John has already fixed in eight 1.0.2 betas that have somehow never seen the light of day. Sure, if you know the right Google incantations you can dig up the unreleased 1.0.2b8 archive, surreptitiously posted May 2007, and start prying out the bugfixes by hand. That's what I've had to do to fix bugs in our open sourced C# Markdown implementation, which was naturally based on that fateful (and technically only) 1.0.1 release.

I'd also expect a reference implementation to come with some basic test suites or sample input/output files so I can tell if I've implemented it correctly. No such luck; the official archives from Gruber's site include the naked Perl file along with a readme and license. The word "test" does not appear in either. I had to do a ton more searching to finally dig up MDTest 1.1. I can't quite tell where the tests came from, but they seem to be maintained by Michel Fortin, the author of the primary PHP Markdown implementation.

But John Gruber created Markdown. He came up with the concept and the initial implementation. He is, in every sense of the word, the parent of Markdown. It's his baby.

Henry aka 'Rock Hard Awesome' taking a bath

As Markdown's "parent", John has a few key responsibilities in shepherding his baby to maturity. Namely, to lead. To set direction. Beyond that initial 2004 push, he's done precious little of either. John is running this particular open source project the way Steve Jobs runs Apple -- by sheer force of individual ego. And that sucks.

Since then, all I can find is sporadic activity on obscure mailing lists and a bit of passive-aggressive interaction with the community.

On 15 Mar 2008, at 02:55, John Gruber wrote:
I despise what you've done with Text::Markdown, which is to more or less make it an alias for MultiMarkdown, almost every part of which I disagree with in terms of syntax additions.
Wow, that's pretty strong language. I'm glad I'm provoking strong opinions, and it's nice to see you actively contributing to Markdown's direction ;)

Personally, I don't actually like (or use) the MultiMarkdown extensions. As noted several times on list, I do not consider what I've done to in any way be a good solution technically / internally in it's current form, and as such Markdown.pl is still a better 'reference' implementation.

However I find it somewhat ironic that you can criticise an active effort to actually move Markdown forwards (who's current flaws have been publicly acknowledged), when it passes more of your test suite than your effort does, and when you haven't even been bothered to update your own website about the project since 2004, despite having updated the code which can be found on your site (if you dig) much more recently than this.

I despise copy-pasted code, and forks for no (real) reason - seeing another two dead copies of the same code on CPAN made me sad, and so I've done something to take the situation forwards. Maybe if you'd put the effort into maintaining a community and taking Markdown.pl forwards at any time within the last 4 years, you wouldn't be in a situation where people have taken 'your baby' and perverted it to a point that you despise. If starting with Markdown.pl and going forwards with that had been an option, then that would have been my preferred route - but I didn't see any value in producing what would have been a fifth perl Markdown implementation.

It's almost at the point where John Gruber, the very person who brought us Markdown, is the biggest obstacle preventing Markdown from moving forward and maturing. It saddens me greatly to see such negligent open source code parenting. Why work against the community when you can work with it? It doesn't have to be this way. And it shouldn't be.

I think the most fundamental problem with Markdown, in retrospect, is that the official home of the project is a static set of web pages on John's site. Gruber could have hosted the Markdown project in a way that's more amenable to open source collaboration than a bunch of static HTML. I'm pretty sure SourceForge was around in late 2004, and there are lots of options for proper open source project hosting today -- GitHub, Google Code, CodePlex, and so forth. What's stopping him from setting up shop on any of those sites with Markdown, right now, today? Markdown is Gruber's baby, without a doubt, but it's also bigger than any one person. It's open source. It belongs to the community, too.

Right now we have the worst of both worlds. Lack of leadership from the top, and a bunch of fragmented, poorly coordinated community efforts to advance Markdown, none of which are officially canon. This isn't merely incovenient for anyone trying to find accurate information about Markdown; it's actually harming the project's future. Maybe it's true that you can't kill an open source project, but bad parenting is surely enough to cause it to grow up stunted and maybe even a little maladjusted.

I mean no disrespect. I wouldn't bring this up if I didn't care, if I didn't think the project and John Gruber were both eminently worthy. Markdown is a small but important part of the open source fabric of the web, and the project deserves better stewardship. While the community can do a lot with the (many) open source orphan code babies out there, they have a much, much brighter future when their parents take responsibility for them.

Posted by Jeff Atwood    9 Comments

December 17, 2009

Building a PC, Part VI: Rebuilding

I can't believe it's been almost two and a half years since I built my last PC. I originally documented that process in a series of posts:

Now, lest you think I am some kind of freakish, cave-dwelling luddite, what with my ancient two and a half year old PC, I have upgraded the CPU, upgraded the hard drive, and upgraded the video card since then. I also went from 4 GB of RAM to 8 GB of RAM, but I didn't happen to blog about that. Normal computers age in dog years -- every year they get seven years older -- but mine isn't so bad with all my upgrades! I swear!

Judge for yourself; here's a picture of it.

digital VT-100 terminal

But seriously.

A big part of the value proposition of building your own PC is upgrading it in pieces and parts over time. When you're unafraid to pop the cover off and get your hands dirty with a little upgrading, you can spend a lot less to stay near the top of the performance heap over time. It's like the argument for buying a car versus renting it; the smart buyers keep the car for as long as possible to maximize the value of their investment. That's what we're doing here with our upgrades, and a rebuild is the ultimate upgrade.

In defense of my creaky old computer, the Core 2 series from Intel has been unusually strong over time, one of their best overall platforms in recent memory. It was almost good enough to banish the excerable Pentium 4 series from my mind. Man those were horrible! But the Core 2 series was a solid design with some serious legs; it and scaled brilliantly, from single to dual to quad core, and in frequency from 1 GHz to 3.5 GHz.

I was initially unimpressed with the new Core i7 architecture that Intel launched to replace the Core 2. While the new Nehalem architecture is a huge win on servers, it's kind of "meh" on the desktop. I have endless battles with overzealous developers who swear up and down that they use their desktops like servers. Sure you do! And you're building the space shuttle with it, right? Of course you are. Yeah.

Meanwhile, back on Planet Desktop, there were some other reasons that I started thinking seriously about upgrading from my overclocked Core 2 Duo to a Core i7 upgrade:

  • The Core i7 platform uses triple channel DDR3 memory. While the benefits of the additional bandwidth are somewhat debatable on the desktop (as usual), one interesting side-effect is that motherboards have 6 memory slots. While 16 GB is theoretically possible on Core 2 systems, it required extremely expensive 8 GB DIMMs. But with 6 memory slots, we can achieve 12 GB without breaking the bank -- by using six 2 GB DIMMs.
  • The Core i7 is Intel's first "real" quad-core architecture. Intel's previous quad core CPUs were basically two dual core CPUs duct taped together on the same die. No such shortcuts were taken with the i7. While the difference is sort of academic, there are some smallish real world performance implications.
  • Mainstream software is finally ready for quad core CPUs. It's not uncommon today to find applications and games that can actually use two CPU cores reasonably effectively, and those that can use four or more cores are not the extreme rarity they used to be. Don't get me wrong, scaling well to four or more CPU cores is still rare, but it's no longer spit-take rare.
  • Intel introduced the mainstream second generation Core i5 series, so the platform is fairly mature. All the new architecture bugs are worked out. It's also less prohibitively expensive than it was when it was when it was introduced.

At this point, I had the seven year upgrade itch really bad. My 3.8 GHz Core 2 Duo with 8 GB of RAM was not exactly chopped liver, but I started fantasizing a lot about the prospect of having a next generation quad-core CPU (of similar clock speed) with hyperthreading and 12 GB of RAM.

If you're wondering why I need this, or why in fact anyone would need such an embarrassment of desktop power, then I'll refer you to my friend Loyd Case.

Don’t ask me why I need six cores and 24GB. To paraphrase a Zen master, if you have to ask, you do not know.

Loyd has indirectly brought up another reason to choose the i7 platform; it's pin-compatible with Intel's upcoming "Gulftown" high end 6-core CPU. So, your upgrade path is clear. (It's also rumored that the next iteration of the Mac Pro will have two of these brand new 6-core CPUs, before any other vendor gets access to them, which is totally plausible.)

As far as I'm concerned, until everything on my computer happens instantaneously, my computer is not nearly fast enough. Besides, relative to how much my time costs, these little $200-$500 upgrades to get amazing performance are freakin' chump change. If I save a measly 15 minutes a day, it's worth it. As I like to remind pointy-haired managers all over the world, Hardware is Cheap, and Programmers are Expensive. OK, maybe I'm biased, but the conclusion was overwhemingly clear: it's UPGRAYEDD time!

the character Upgrayedd, from the movie Idiocracy

This is a more than an upgrade, though, it's a rebuild -- a platform upgrade. That means I'll be assembling the following …

  • new Motherboard
  • new RAM
  • new CPU
  • new heatsink

… and dropping that into my existing system, which is highly optimized for silence. The case, power supply, hard drives, DVD-R, etc won't change. On the outside, it'll look the same, but on the inside, it's a whole new PC. This is analogous to replacing the engine in a sports car, I suppose. On the outside, it will appear to be the same car, but there's a lot more horses under the hood.

As I said in the first part of my building your own PC series, if you can assemble a LEGO kit, you can build a PC.

Take your time, be careful, and go in the right order. So, first things first. Let's assemble the CPU, heatsink, and memory on the motherboard -- in that specific sequence, because modern heatsinks can be a pain to attach.

motherboard, CPU, heatsink, and memory assembled for the rebuild

Man, check out at all that hot, sweet, PC hardware! I get a little residual thrill just cropping the picture. Love this stuff! Anyway, that gives us a mountable motherboard with all the important bits pre-installed:

  • ASRock X58 Extreme motherboard ($169)
    Inexpensive, has all the essential features I care about, and is recommended by Tom's Hardware. I'm not into fancy, spendy motherboards; I think they're a ridiculous waste of money.
  • XIGMATEK HDT-S1283 cooler ($35).
    Direct contact between the CPU cooler heatpipes and the CPU surface is the new hotness, or rather, coolness. It really works, since all the top performing CPU coolers use it now. This one is fairly inexpensive at $35 and gets great reviews. Also, I highly recommend the optional screw mount kit ($8). Modern CPU coolers are large, and the mounting mechanism needs to be more solid than plastic pushpins.
  • Kingston HyperX 4GB (2 x 2GB) DDR3 2000 ($135) × 3
    I've had good luck with Kingston in the past. I went with their semi-premium brand this time, as I plan to do a bit of overclocking and the price difference is fairly small. Remember, this is a 12 GB build, so we'll need three of these kits to populate all 6 memory slots on the motherboard.
  • Intel Core i7-960 3.2 GHz CPU ($590)
    While you could make a very solid argument that the Core i7-920 CPU ($289) is a better choice because it's identical and overclocks to the same level, I was willing to spend a bit more here as "insurance" that I get to the magical 3.8 Ghz level that my old Core 2 Duo was overclocked to.

update: since a few people asked, here are my case and power supply recommendations.

  • Antec P183 Black Computer Case ($140)
    I used the older P180/P182 Antec case in my original series; it's still one of my favorites. This version brings some much needed improvements to airflow to accommodate higher power CPUs and video cards, as documented in a recent Silent PC review article.
  • CORSAIR CMPSU-650HX 650W Power Supply ($120)
    You don't want to skimp on the power supply, but there's no need to spend exorbitant amounts, either. Forget the wattage rating and look at the quality. Corsair is known for very high quality power supplies. The HX series is a bit more, but has modular cables, which makes for a cleaner build.

It adds up to about $1000 all told. A rebuild is definitely more expensive than one-off upgrades of CPU, memory, and hard drive. But, remember, this is a rebuild of my PC -- and a fire-breathing, top of the line performance rebuild at that. That takes spending a moderate (but not exorbitant) amount of money.

Now that we've got all that stuff assembled, the next thing to do is open my existing PC, disconnect all the cables going to the motherboard, temporarily remove any expansion cards, unscrew the motherboard and lift it out.

old PC with motherboard assembly in place

Once the old motherboard assembly was pulled out, I plopped in the new motherboard, screwed it down, and reattached the cables and expansion cards. Don't close up the PC at this point, though. Before powering it on, double check and make sure all the cables are reattached correctly:

  • Power cables from the PSU to the motherboard. There are usually at least two, on modern PCs.
  • Hard drive cables from the HDDs to the motherboard.
  • Power switch, Reset switch, Activity light cables. Without the power switch connected, good luck powering up. This motherboard happens to have built-in power and reset switches for testing, but most don't.
  • Fan connectors from the Heatsink and case fans to the motherboard.
  • Power cables from the PSU to the video card, if you have a fancy video card.

If anything is wrong, we'll just have to re-open the case again. On top of that, we need to monitor temperatures and airflow, and that's much easier with the case open.

Fortunately, my rebuild booted up on the first try. If you're not so lucky, don't fret! Disconnect the power cord, then go back and re-check everything. I get it wrong, sometimes, too; I actually forgot to reconnect the video card power connectors, and was wondering why only the secondary video card was booting up. Once I re-checked, I immediately saw my mistake, fixed it, and rebooted.

Once you have a successful boot, don't even think about booting into the operating system yet. Enter the BIOS (this is typically done by pressing F12 or Delete during bootup) and check the BIOS screens to make sure it's detecting your hard drives, memory, and any optical drives successfully. Browse around and do some basic reality checks. Then do not pass GO, do not collect $200, go straight to your motherboard manufacturer's website and download the latest BIOS. On another computer, obviously. Most modern motherboards allow updating the BIOS from a USB key, so just copy the BIOS files on the USB key, reboot, and use the BIOS menus to update. After you've updated the BIOS, set BIOS options to taste, and we're finally ready to boot into an operating system.

While this may sound like a lot of work, it really isn't. All told it was maybe an hour, tops. I'm fairly experienced at this stuff, but it's fundamentally not that complicated; it's still just a very fancy adult LEGO kit.

Courtesy of this $1000 rebuild, my ancient 2.5 year old PC is reborn as a completely new state-of-the-art PC, at least internally. That was always part of the plan! Next up -- once we've proven that it's stable in typical use -- overclocking, naturally. I'll have more on that in a future blog post, but I can tell you right now that Core i7 overclocking is … interesting.

Posted by Jeff Atwood    14 Comments

December 14, 2009

International Backup Awareness Day

You may notice that commenting is currently disabled, and many old Coding Horror posts are missing images. That's because, sometime early on Friday, the server this blog is hosted on suffered catastrophic data loss.

Here's what happened:

  1. The server experienced routine hard drive failure.
  2. Because of the hard drive failure, the virtual machine image hosting this blog was corrupted.
  3. Because the blog was hosted in a virtual machine, the standard daily backup procedures at the host were unable to ever back it up.
  4. Because I am an idiot, I didn't have my own (recent) backups of Coding Horror. Man, I wish I had read some good blog entries on backup strategies!
  5. Because there were no good backups, there was catastrophic data loss. Fin, draw curtain, exeunt stage left.

At first, I was upset with our provider, CrystalTech.

Our Disaster Recovery Plan Goes Something Like THis

I am still confused how the most common, routine, predictable, and mundane of server hardware failures -- losing a mechanical hard drive -- could cause such extreme data loss carnage. What about, oh, I don't know, a RAID array? Aren't they designed to prevent this kind of single point of failure drive loss catastrophe? Isn't a multi drive RAID array sort of standard on datacenter servers? I know we have multi-drive RAID arrays on all of our Stack Overflow servers.

I also wish their routine backup procedures had greater awareness of virtual machine images. While I'll grant you that backing up a live virtual machine is somewhat complex, and typically requires special operating system support and API hooks, it is not exactly an unknown science at this point in time. Heck, at the very least, just let us know when the backup has been regularly failing each day, every day, for years.

Then I belatedly realized that this was, after all, my data. And it is irresponsible of me to leave the fate of my data entirely in someone else's hands, regardless of how reliable they may or may not be. Responsibility for my data begins with me. If I haven't taken appropriate measures, who am I to cast aspersions on others for not doing the same? Glass houses and all that.

So, I absolve CrystalTech of all responsibility in this matter. They've given us a great deal on our dedicated server, and performance and reliability (with one recent, uh... exception) have been excellent to date. It is completely my fault that I neglected to have proper backups in place for Coding Horror. Well, technically, I did have a backup but it was on the virtual machine itself. Does that count? No? Halfsies?

Apparently, I was gambling that nothing bad would ever happen at the datacenter. Because that's what you're doing when you run without your own backups. Gambling.

you gotta know when to hold 'em

I'll add gambling to the long, long list of things I suck at. I don't know when to hold 'em or when to fold 'em.

Now that I've apologized, it's time to let the healing begin. And by healing, I mean the excruciatingly painful process of reconstructing Coding Horror from internet caches and the few meager offsite backups I do have. My first order of business was to ask on SuperUser what strategies people recommend for recovering a lost website with no backup. Strategies other than berating me for my obvious mistake. Also, comments are currently disabled while the site is being reconstructed from static HTML. Oh, darn!

I'll let my son Rock Hard Awesome stand in for the zinger of a comment that I know some of you were just dying to leave.

I am liveblogging your fail

I'm not saying I don't deserve it. Consider me totally zingatized.

I mentioned my woes on Twitter and I was humbled by the outpouring of community support. Thanks to everyone who reached out with support of any kind. It is greatly appreciated.

I was able to get a static HTML version of Coding Horror up almost immediately thanks to Rich Skrenta of blekko.com. He kindly provided a tarball of every spidered page on the site. Some people have goals, and some people have big hairy audacious goals. Rich's is especially awe-inspiring: taking on Google on their home turf of search. That's why he just happened to have a complete text archive of Coding Horror at hand. Rich, have I ever told you that you're my hero? Anyway, you're viewing the static HTML version of Coding Horror right now thanks to Rich. Surprisingly, there's not a tremendous amount of difference between a static HTML version of this site and the live site. One of the benefits of being a minimalist, I suppose.

That pretty much solved all my text post recovery problems in one fell swoop. Through this process, I've learned that anything even remotely popular you put on the web will be archived as text, forever, by a dozen different web spiders. I don't think you can actually lose text you post on the web. Not in any meaningful sense; I'm not sure it's possible. As long as you're willing to spend the time digging through web spider archives in some form (and yes, I did cheat mightily), you can always get textual content back, all of it.

The blog images, however, are another matter entirely. I have learned the hard way that there are almost no organizations spidering and storing images on the web. Yes, there is archive.org, and God bless 'em for that. But they have an impossible job they're trying to do with limited resources. Beyond that, there's ... well, frankly, a whole lot of nothing. A desperate, depressing void of nothing. In fact, if you can only back up one thing on your public website, it should be the images. Because that's the thing you'll have the most difficulty recovering when catastrophe happens. I'm planning to donate $100 to archive.org as I have a whole new appreciation for how rare an internet-wide full archive service -- one that includes images -- really is.

That said, There are some limited, painful avenues to explore for recovering lost website images. I started with an ancient complete backup from mid 2006 with full images. And then Maciej Ceglowski of the nifty full-archive bookmarking service pinboard.in generously contributed about 200 blog posts that he had images for.

I also went through a period when I was going on a bandwidth diet and experimenting with hosting Coding Horror images elsewhere on the web. I'm slowly going through and recovering images locally from there. Beyond that, several avid Coding Horror readers contributed some archived images -- so thanks to Yasushi Aoki, Marcin Gołębiowski, Peter Mortensen, and anybody else I've forgotten.

Also, I should point out that a few enterprising programmers have proposed clever schemes for automatic recovery of images, such as Niyaz with his blog post Get cached images from your visitors, and John Siracusa with his highly voted 304 idea. I haven't had time to follow up on these yet but they seem plausible to me.

I've restored all the images I have so far, but it's still woefully incomplete. The most important part of Coding Horror is definitely the text of the posts, but I do have some regrets that I've lost key images from many blog posts, including those about my son. It feels like irresponsible parenting, in the broadest possible sense of the words.

The process of image recovery is still ongoing. If you'd like to contribute lost Coding Horror images, please do. I'd be more than happy to mail stickers on my dime to anyone who contributes an image that is currently a 404 on the site. Update: That was fast. Carmine Paolino, a computer science student at the University of Bologna, somehow had a nearly complete mirror of the site backed up on his Mac! Thanks to his mirror, we've now recovered nearly 100% of the missing images and content. I've offered to donate $100 to the charity or open source project of Carmine's choice.

What can we all learn from this sad turn of events?

  1. I suck.
  2. No, really, I suck.
  3. Don't rely on your host or anyone else to back up your important data. Do it yourself. If you aren't personally responsible for your own backups, they are effectively not happening.
  4. If something really bad happens to your data, how would you recover? What's the process? What are the hard parts of recovery? I think in the back of my mind I had false confidence about Coding Horror recovery scenarios because I kept thinking of it as mostly text. Of course, the text turned out to be the easiest part. The images, which I had thought of as a "nice to have", were more essential than I realized and far more difficult to recover. Some argue that we shouldn't be talking about "backups", but recovery.
  5. It's worth revisiting your recovery process periodically to make sure it's still alive, kicking, and fully functional.
  6. I'm awesome! No, just kidding. I suck.

So when, exactly, is International Backup Awareness Day? Today. Yesterday. This week. This month. This year. It's a trick question. Every day is International Backup Awareness Day. And the sooner I figure that out, the better off I'll be.

Posted by Jeff Atwood    11 Comments

December 10, 2009

Microformats: Boon or Bane?

I recently added microformat support to the free public CVs at careers.stackoverflow.com by popular demand.

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

The official microformat "elevator pitch" tells us nothing useful. That's not a good sign. It doesn't get much better on the learn more link, either.

I'm left scratching my head, wondering why I should care. What problem, exactly, do microformats solve for me as a user? As a software developer? There's lots of hand-wavy talk about data, but precious little in the way of concrete stories or real world examples.

But I have a real world example: a CV. To some human resource departments the standard web interchange format for a CV or Resume is already established -- it's called "Microsoft Word". I have no beef with Word, but certainly we'd like to pick a more simple, open data format for our personal data than Microsoft Word -- and the hResume microformat seems to fit the bill. And if your CV is published on the web in a standard(ish) format, it's easier to take it with you wherever you need to go.

I had already implemented the tag and identity microformats on Stack Overflow many months ago. I wasn't convinced of the benefits, but the implementation was so easy that it seemed like more work to argue the point than to actually get it done. Judge for yourself:

<a href="http://www.codinghorror.com/" rel="me">codinghorror.com</a>
<a href="/questions/tagged/captcha" rel="tag">captcha</a>

Fairly clean and simple, right? That was the extent of my experience with microformats. Limited, but positive. Then I read through the hResume microformat spec. You should read it too. Go ahead. I'll wait here.

My first impression was not positive, to put it mildly. So you want me to take the ambiguous, crappy "HTML" markup we already have and layer some ambiguous, crappy "microformat" markup on top of it? And that's … a solution? If that's what microformats are going to be about, I think I might want off the microbus.

Let's take a look at a representative slice of hResume markup:

<div class="vcard">
  <a class="fn org url" href="http://example.com/">Example</a>
  <div class="adr">
    <span class="type">Work</span>:
    <div class="street-address">169 Maple Ave</div>
    <span class="locality">Anytown</span>,  
    <abbr class="region" title="Iowa">IA</abbr>  
    <span class="postal-code">50981</span>
    <div class="country-name">USA</div>
  </div>
</div>

As you can see, the crux of microformats is overloading CSS classes. When you give something the "adr" class within the "vcard" class, that means it's the address data field within the hCard, within the hResume.

While I can see the attraction, this approach makes me uneasy:

  1. We're overloading the class attribute with two meanings. Is "adr" the name of a class that contains styling information, or the name of a data field? Or both? It's impossible to tell. The minute you introduce a microformat into your HTML, the semantics of the class attribute have been permanently altered.
  2. The microformat css class names may overlap with existing css classes. Woe betide the poor developer who has to retrofit a microformat on an established site where "locality" or "region" have already been defined in the CSS and are associated with elements all over the site. And let me tell you, many of the microformat css field names are, uh, conveniently named what you've probably already used in your HTML somewhere. In the wrong way, inevitably.
  3. There's no visual indication whatsoever that any given css class is a microformat. If you hire a new developer, how can they possibly be expected to know that "postal-code" isn't just an arbitrarily chosen CSS class name, it's a gosh darned officially blessed microformat? What if they decide they don't like dashes in CSS class names and rename the style "postalcode"? Wave bye bye to your valid microformat. If it seems fragile and obtuse, that's because it is.
  4. The spec is incredibly ambiguous. I read through the hResume, hCard, and hCalendar spec multiple times, checked all the samples, viewed source on existing sites, used all the validators I could find, and I still got huge swaths of the format wrong! For a "simple" and "easy" format, it's … anything but, in my experience. The specification is full of ambiguities and requires a lot of interpretation to even get close. I'm not the world's best developer, but I'm theoretically competent, and if I can't implement hResume without wanting to cut myself and/or writing snarky blog posts like this, how can we expect everyone else to?
  5. It doesn't handle unstructured data well. On Stack Overflow, we have a single "location" field. No city, state, zip, lat, long, and all that jazz: just an unstructured, freeform, enter-whatever-pleases-you "location" field. This was awkward to map in hCard, which practically demands that addresses be chopped up into meticulous little sub-fields. This is a bit ironic for a format supposedly designed to work with the loose, unstructured world wide web. Oh, and this goes double for dates. If you don't have an ISO datetime value, good luck.

Maybe I have a particular aversion to getting my chocolate data structure mixed up with my peanut butter layout structure, but it totally skeeves me out that the microformat folks actually want us to design our CSS and HTML around these specific, ambiguous and non-namespaced microformat CSS class names. It feels like a hacky overload. While you could argue this is no different than the web and HTML in general -- a giant wobbly teetering tower of nasty, patched-together hacks -- something about it fundamentally bothers me.

Now, all that said, I still think microformats are useful and worth implementing, if for no other reason than it's too easy not to. If you have semi-structured data, and it maps well to an existing microformat, why not? Yes, it is kind of a hack, but it might even be a useful hack if Google starts indexing your microformats and presenting them in search results. While I'm unclear on the general benefits of microformats for end users or developers, seeing stuff like this in search results …

google-microformat-results-forum.png

google-microformat-results-review.png

… is enough to convince me that microformats are a step in the right direction. Warts and all. While we're waiting for HTML5 and its mythical data attributes to ship sometime this century, it's better than nothing.

Posted by Jeff Atwood    70 Comments

December 3, 2009

Version 1 Sucks, But Ship It Anyway

I've been unhappy with every single piece of software I've ever released. Partly because, like many software developers, I'm a perfectionist. And then, there are inevitably … problems:

  • The schedule was too aggressive and too short. We need more time!
  • We ran into unforeseen technical problems that forced us to make compromises we are uncomfortable with.
  • We had the wrong design, and needed to change it in the middle of development.
  • Our team experienced internal friction between team members that we didn't anticipate.
  • The customers weren't who we thought they were.
  • Communication between the designers, developers, and project team wasn't as efficient as we thought it would be.
  • We overestimated how quickly we could learn a new technology.

The list goes on and on. Reasons for failure on a software project are legion.

At the end of the development cycle, you end up with software that is a pale shadow of the shining, glorious monument to software engineering that you envisioned when you started.

It's tempting, at this point, to throw in the towel -- to add more time to the schedule so you can get it right before shipping your software. Because, after all, real developers ship.

I'm here to tell you that this is a mistake.

Yes, you did a ton of things wrong on this project. But you also did a ton of things wrong that you don't know about yet. And there's no other way to find out what those things are until you ship this version and get it in front of users and customers. I think Donald Rumsfeld put it best:

As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.

In the face of the inevitable end-of-project blues -- rife with compromises and totally unsatisfying quick fixes and partial soutions -- you could hunker down and lick your wounds. You could regroup and spend a few extra months fixing up this version before releasing it. You might even feel good about yourself for making the hard call to get the engineering right before unleashing yet another buggy, incomplete chunk of software on the world.

Unfortunately, this is an even bigger mistake than shipping a flawed version.

Instead of spending three months fixing up this version in a sterile, isolated lab, you could be spending that same three month period listening to feedback from real live, honest-to-god, annoyingdedicated users of your software. Not the software as you imagined it, and the users as you imagined them, but as they exist in the real world. You can turn around and use that directed, real world feedback to not only fix all the sucky parts of version 1, but spend your whole development budget more efficiently, predicated on hard usage data from your users.

Now, I'm not saying you should release crap. Believe me, we're all perfectionists here. But the real world can be a cruel, unforgiving place for us perfectionists. It's saner to let go and realize that when your software crashes on the rocky shore of the real world, disappointment is inevitable … but fixable! What's important isn't so much the initial state of the software -- in fact, some say if you aren't embarrassed by v1.0 you didn't release it early enough -- but what you do after releasing the software.

The velocity and responsiveness of your team to user feedback will set the tone for your software, far more than any single release ever could. That's what you need to get good at. Not the platonic ideal of shipping mythical, perfect software, but being responsive to your users, to your customers, and demonstrating that through the act of continually improving and refining your software based on their feedback. So to the extent that you're optimizing for near-perfect software releases, you're optimizing for the wrong thing.

There's no question that, for whatever time budget you have, you will end up with better software by releasing as early as practically possible, and then spending the rest of your time iterating rapidly based on real world feedback.

So trust me on this one: even if version 1 sucks, ship it anyway.

Posted by Jeff Atwood    145 Comments
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.