International Backup Awareness Day

December 14, 2009

You may notice that commenting is currently disabled, and many old Coding Horror posts are missing images. That's because, sometime early on Friday, the server this blog is hosted on suffered catastrophic data loss.

Here's what happened:

  1. The server experienced routine hard drive failure.
  2. Because of the hard drive failure, the virtual machine image hosting this blog was corrupted.
  3. Because the blog was hosted in a virtual machine, the standard daily backup procedures at the host were unable to ever back it up.
  4. Because I am an idiot, I didn't have my own (recent) backups of Coding Horror. Man, I wish I had read some good blog entries on backup strategies!
  5. Because there were no good backups, there was catastrophic data loss. Fin, draw curtain, exeunt stage left.

At first, I was upset with our provider, CrystalTech.

Our Disaster Recovery Plan Goes Something Like THis

I am still confused how the most common, routine, predictable, and mundane of server hardware failures -- losing a mechanical hard drive -- could cause such extreme data loss carnage. What about, oh, I don't know, a RAID array? Aren't they designed to prevent this kind of single point of failure drive loss catastrophe? Isn't a multi drive RAID array sort of standard on datacenter servers? I know we have multi-drive RAID arrays on all of our Stack Overflow servers.

I also wish their routine backup procedures had greater awareness of virtual machine images. While I'll grant you that backing up a live virtual machine is somewhat complex, and typically requires special operating system support and API hooks, it is not exactly an unknown science at this point in time. Heck, at the very least, just let us know when the backup has been regularly failing each day, every day, for years.

Then I belatedly realized that this was, after all, my data. And it is irresponsible of me to leave the fate of my data entirely in someone else's hands, regardless of how reliable they may or may not be. Responsibility for my data begins with me. If I haven't taken appropriate measures, who am I to cast aspersions on others for not doing the same? Glass houses and all that.

So, I absolve CrystalTech of all responsibility in this matter. They've given us a great deal on our dedicated server, and performance and reliability (with one recent, uh... exception) have been excellent to date. It is completely my fault that I neglected to have proper backups in place for Coding Horror. Well, technically, I did have a backup but it was on the virtual machine itself. Does that count? No? Halfsies?

Apparently, I was gambling that nothing bad would ever happen at the datacenter. Because that's what you're doing when you run without your own backups. Gambling.

you gotta know when to hold 'em

I'll add gambling to the long, long list of things I suck at. I don't know when to hold 'em or when to fold 'em.

Now that I've apologized, it's time to let the healing begin. And by healing, I mean the excruciatingly painful process of reconstructing Coding Horror from internet caches and the few meager offsite backups I do have. My first order of business was to ask on SuperUser what strategies people recommend for recovering a lost website with no backup. Strategies other than berating me for my obvious mistake. Also, comments are currently disabled while the site is being reconstructed from static HTML. Oh, darn!

I'll let my son Rock Hard Awesome stand in for the zinger of a comment that I know some of you were just dying to leave.

I am liveblogging your fail

I'm not saying I don't deserve it. Consider me totally zingatized.

I mentioned my woes on Twitter and I was humbled by the outpouring of community support. Thanks to everyone who reached out with support of any kind. It is greatly appreciated.

I was able to get a static HTML version of Coding Horror up almost immediately thanks to Rich Skrenta of blekko.com. He kindly provided a tarball of every spidered page on the site. Some people have goals, and some people have big hairy audacious goals. Rich's is especially awe-inspiring: taking on Google on their home turf of search. That's why he just happened to have a complete text archive of Coding Horror at hand. Rich, have I ever told you that you're my hero? Anyway, you're viewing the static HTML version of Coding Horror right now thanks to Rich. Surprisingly, there's not a tremendous amount of difference between a static HTML version of this site and the live site. One of the benefits of being a minimalist, I suppose.

That pretty much solved all my text post recovery problems in one fell swoop. Through this process, I've learned that anything even remotely popular you put on the web will be archived as text, forever, by a dozen different web spiders. I don't think you can actually lose text you post on the web. Not in any meaningful sense; I'm not sure it's possible. As long as you're willing to spend the time digging through web spider archives in some form (and yes, I did cheat mightily), you can always get textual content back, all of it.

The blog images, however, are another matter entirely. I have learned the hard way that there are almost no organizations spidering and storing images on the web. Yes, there is archive.org, and God bless 'em for that. But they have an impossible job they're trying to do with limited resources. Beyond that, there's ... well, frankly, a whole lot of nothing. A desperate, depressing void of nothing. In fact, if you can only back up one thing on your public website, it should be the images. Because that's the thing you'll have the most difficulty recovering when catastrophe happens. I'm planning to donate $100 to archive.org as I have a whole new appreciation for how rare an internet-wide full archive service -- one that includes images -- really is.

That said, There are some limited, painful avenues to explore for recovering lost website images. I started with an ancient complete backup from mid 2006 with full images. And then Maciej Ceglowski of the nifty full-archive bookmarking service pinboard.in generously contributed about 200 blog posts that he had images for.

I also went through a period when I was going on a bandwidth diet and experimenting with hosting Coding Horror images elsewhere on the web. I'm slowly going through and recovering images locally from there. Beyond that, several avid Coding Horror readers contributed some archived images -- so thanks to Yasushi Aoki, Marcin Gołębiowski, Peter Mortensen, and anybody else I've forgotten.

Also, I should point out that a few enterprising programmers have proposed clever schemes for automatic recovery of images, such as Niyaz with his blog post Get cached images from your visitors, and John Siracusa with his highly voted 304 idea. I haven't had time to follow up on these yet but they seem plausible to me.

I've restored all the images I have so far, but it's still woefully incomplete. The most important part of Coding Horror is definitely the text of the posts, but I do have some regrets that I've lost key images from many blog posts, including those about my son. It feels like irresponsible parenting, in the broadest possible sense of the words.

The process of image recovery is still ongoing. If you'd like to contribute lost Coding Horror images, please do. I'd be more than happy to mail stickers on my dime to anyone who contributes an image that is currently a 404 on the site. Update: That was fast. Carmine Paolino, a computer science student at the University of Bologna, somehow had a nearly complete mirror of the site backed up on his Mac! Thanks to his mirror, we've now recovered nearly 100% of the missing images and content. I've offered to donate $100 to the charity or open source project of Carmine's choice.

What can we all learn from this sad turn of events?

  1. I suck.
  2. No, really, I suck.
  3. Don't rely on your host or anyone else to back up your important data. Do it yourself. If you aren't personally responsible for your own backups, they are effectively not happening.
  4. If something really bad happens to your data, how would you recover? What's the process? What are the hard parts of recovery? I think in the back of my mind I had false confidence about Coding Horror recovery scenarios because I kept thinking of it as mostly text. Of course, the text turned out to be the easiest part. The images, which I had thought of as a "nice to have", were more essential than I realized and far more difficult to recover. Some argue that we shouldn't be talking about "backups", but recovery.
  5. It's worth revisiting your recovery process periodically to make sure it's still alive, kicking, and fully functional.
  6. I'm awesome! No, just kidding. I suck.

So when, exactly, is International Backup Awareness Day? Today. Yesterday. This week. This month. This year. It's a trick question. Every day is International Backup Awareness Day. And the sooner I figure that out, the better off I'll be.

Posted by Jeff Atwood
11 Comments

Aside from the bandwidth issue, consider something like http://www.ietf.org/rfc/rfc2397 for the future :p

Rick Olson on February 21, 2010 6:21 AM

The link to your Visual Studio Settings is also still broken. That one should be easy to recover? :P

Poti on March 13, 2010 7:07 AM

In my job, there has been 100% turnover in our IT dept within the last 6 months. We are all new and there is absolutely no documentation from the previous regime. There were many hands in the pot and by pot I mean server room. We have systems in various racks that we don't even know the purpose of. Anyhow, being that no one really knows exactly how everything is set up, when we had a server crash recently it was catastrophic. Apparently the automated backup was disabled and we lost quite a bit of data and had to go back to a month old tape backup. SO yea, I feel your pain on this one. Since you mentioned gambling, go to http://www.wildcatgamblers.com if you want to suck at gambling less. That's a free tip ;-)

Clevelandroxx on July 24, 2010 2:18 PM

I feel you pain!

Thanks for the post.

ScottFoleyMultiMedia on July 24, 2010 2:35 PM

Haha.. I'm just hanging out at work ( www.gsid.net) and the accountant shows me this link... I personally think the dilbert cartoon is hillarious.. and I decided to print it out for my own cubical (I can relate to him, lol)

Kelso Kennedy on July 28, 2010 1:52 AM

Haha.. I'm just hanging out at work (www.gsid.net) and the accountant shows me this link... I personally think the dilbert cartoon is hillarious.. and I decided to print it out for my own cubical (I can relate to him, lol)

Kelso Kennedy on July 28, 2010 1:52 AM

I for one am a huge backup freak. I have a RAID 1 array for my desktop pc and still keep all of my data backup on both external hard drives and internal hard drives put into racks. I would go nuts if I would lose my data.

Regards,

George of http://webhostingforacent.com/?v=g

Jugaru George on November 2, 2010 12:27 PM

@George Wait, you're a backup freak and rely on multiple copies in a single location? No remote backups? What if there was a fire?

Cultiv on March 7, 2011 2:55 AM

Hey,

A bunch of users on Reddit decided that there should be more awareness around backups and especially the upkeep of backup restores. This has cumulated in designating March 31st as World Backup Day. (http://www.worldbackupday.net)

Yeah, the holiday is a play off of April Fools and to "preempt" any pranks or hard drive failures. Currently the site is just a single page but will hopefully expand soon to be more comprehensive.

Checkout our twitter feed @WorldBackupDay (http://twitter.com/WorldBackupDay) and our facebook group or blog about the importance of backing up and checking your data restores.

WorldBackupDay on March 26, 2011 1:03 PM

I have 3 VPSs (little servers). Two in different US data-centres, and one in Australia (where I live). Total cost ~$50 / month. (I use prgmr.com and crucial paradigm.)

I replicate everything that I care about (e.g. code, web-sites, mail) across my home computers, my work computers and these VPSs, using git.
As a bonus, it keeps a complete history of my work, and compresses it; and it's lightning fast. I use it for mail, because it's faster and better than IMAP.

No need to back up movies that you downloaded off bittorrent, you can download them again. Just back up your code and data. Go, make yourself some off-site backups scattered around the world, and do it now!

I also publish nearly all of this data (my code) on the web as a side-effect of committing it to my servers whenever I make any changes.

Swatkins on May 19, 2011 5:25 AM

I don't really want to be nasty, as you've repented already I can see... but if you're operating a major web-site and don't have multiple off-site backups on several different servers scattered around the world, then it's hard for me to take anything else you might say seriously. This guy didn't back up his community web site - so what credibility do you have with anything technical? Backup is like the fundamental most important thing you need to do with any project or service, since it's inevitable that your systems will die sooner or later. My projects won't vanish unless both Australia and the United States are nuked into oblivion at the same time. I better put some other backups in Antarctica or Africa or somewhere for extra security! Also, it costs almost nothing to make full incremental off-site backups.

Swatkins on May 19, 2011 5:29 AM

The comments to this entry are closed.