I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

May 19, 2008

Twitter: How Not To Crash Responsibly

In yesterday's post on Crashing Responsibly, I outlined a few ways to improve your application's crash behavior. In the event that your application crashes -- and oh, it will -- why not turn that crash into something that:

  • Records lots of diagnostic information developers can use to improve the application over time.
  • Reassures users and provides them with helpful information.

With that in mind, let's take a look at the Twitter crash page. How does it serve developers and users?

Twitter: Something is Technically Wrong

I don't mean to pick on Twitter; their bouts of downtime are near legendary at this point. Frankly, it's been discussed to death.

It's unfortunate, because I love Twitter. Like Michael Lopp, I'm dangerously close to being a Twitter fanboy.

The answer comes down to value. In the time that I've been using Twitter, it's transformed from a curiosity to an essential service. What were seemingly random status updates have now become organized into organic conversational threads that bring a steady flow of relevant content across my desktop.

An "essential service" is exactly the kind of thing you don't want to see error pages on. So, then, how does the Twitter error page fare?

Not so badly at first glance. It's an attractive error page, styled to match Twitter, with some basic links and navigational elements. Let's be generous and assume that the notification and logging of errors behind the scenes is taken care of. The Twitter developers must have access to a voluminous set of error logs by now.

But Twitter's error page is conspicuously lacking any real information. As an enthusiastic Twitter user presented with this error page, I am anything but reassured. Instead, I have some nagging questions:

  • Is this an ephemeral, temporary error or some kind of scheduled downtime? How do I tell the difference?
  • If this is scheduled downtime, when will it be over? Can I view the maintenance schedule, or the current status of the maintenance work?
  • Is Twitter down for everyone, or just me? Is there a place I can go to check Twitter's current system health?
  • Twitter has a reputation for unreliability. Where can I find out about Twitter's ongoing efforts to improve their reliability?

There's absolutely no mention of any of these things on the error page, the exact place I would care the most. Clicking through to the blog provides no relief, no mention of any availability work or maintenance schedules.

Furthermore, it's difficult to take the glib claim that "we're going to fix it up and have things back to normal soon" seriously. I've seen so much of the Twitter error page in the last year that I've lost confidence that these errors mean anything to anyone -- or that they're even recorded. This is the static error page that cried wolf. Where's the improvement over time from the collection and analysis of these errors?

I understand that Twitter has scaling problems I can only dream of. I don't envy the amount of work they'll have to undertake to fix this pernicious, systemic problem of massive scale.

But I sure wish they could be a lot more transparent about it.

Isn't that what crashing responsibly is all about -- establishing an honest, open dialog between users and developers, even at the worst possible moment of that relationship?

[advertisement] Don't denormalize your data just to write reports! Data Dynamics Reports can use your existing data relationships when creating reports.

Posted by Jeff Atwood    View blog reactions

 

« Crash Responsibly PHP Sucks, But It Doesn't Matter »

 

Comments

The only thing I can see wrong with Twitter's error page is that it doesn't really admit there is an error and doesn't apologise for it.

Their 'technically wrong' stance sounds more like a denial of the fact that they can't run their site properly. And the picture doesn't help either.

As the application is running on their server, they can log what the problem is and whatever triggered it. Why should they burden me with those dirty details when they are the only ones who can fix them anyway?

I don't think that an error message or code would help me in any way. Perhaps it'd give the hardcore geeks some peace of mind, but for the remaining 98% of the population it'd just be gibberish.

ssp on May 20, 2008 04:06 AM

But at least there's a pretty birdy on it. It could be worse.

[d3m0n] on May 20, 2008 04:07 AM

Leave them alone.

Niyaz PK on May 20, 2008 04:07 AM

I have to ask, what is with that image on the crash page?

And do you think they might be so overwhelemed with so many reports that they just "leave that for later"?

Matthew on May 20, 2008 04:23 AM

The thing is, this downtime was scheduled wasnt it?! I did get a flash of a warning on the web saying that it would be down for an hour or so for maintenence..

Put simply, if they had a page up saying "down for maintence" wouldnt that be a whole lot more positive than "technically wrong"?

Even better, <rant>FINALLY FIX THE DAMN PROBLEM ITS BEEN DOWN ENOUGH!</rant>

Rob on May 20, 2008 04:42 AM

The only time I notice is the little dots at the bottom right of Twitterific. I'm only following a couple of people and there are only a couple of people following me, so it's not big deal and I don't pay for it either. But seriously, how do the Twitter people make money? The only way I can see is via sms text messages.

Joel did a good post about this a while back when their fogbugz on demand server went down. http://www.joelonsoftware.com/items/2008/01/22.html

John Ferguson on May 20, 2008 04:47 AM

Jeff,

One downtime was scheduled today (the one that occured May 20 2008 at 1700H @ GMT+8) -- Twitter had a small inconspicuous text box speaking of a scheduled 2 hour downtime on their page, but once they went down they didn't carry over the downtime notification message to that generic "technical difficulties" page.

I feel that, if only they carry over that message appropriately, their responsibility would be fulfilled.

Jon Limjap on May 20, 2008 04:50 AM

As you said above

What is the error (even vaguely)
It is just my error (or everyone)
Is is just temporary (should I just reload/try again)
How long will it last (even vaguely)

This error message answers none of these ... so is bad

Jaster on May 20, 2008 04:52 AM

Looks like their looking for Operations Engineers... http://twitter.com/help/jobs

Operations Engineer

Twitter is seeking a seasoned Operations Engineer to join our Operations team.

Key areas of responsibility
- Continually improve the performance and scalability of the service

Bonus
* - Experience with Ruby on Rails stack performance tuning


As nice as it is to write "funky community" sites in "bleeding edge" languages and frameworks, some poor chap has to keep the blinking-lights flashing 24/7. The ops guy's here must be racking up the overtime.

As for informative "site down" messages, when you have a sprawling architecture it can be difficult to determine what has gone wrong and how long to fix it. A well engineered system should have no single points of failure (OK, OK, within economic reason) and a short (and tested) Return to Operations plan for any failures.

Guy on May 20, 2008 04:58 AM

A Black Swan is a wonderful analogy, used for centuries as an example of a never found creature "All swans are white", then Australia was discovered along with Cygnus atratus the black swan ...

Reminds me of "Can't happen" traps in code ... that occasionally show up in error messages and logs....

The impossible happens more often than you think?

Jaster on May 20, 2008 04:59 AM

One thing I'd like to see errors pages have is a reload. So if you leave the page open it'll reload every few minutes with an updated error message and eventually the page you originally wanted.

Especially good if it's a page users leave open 24x7 ( like gmail or bloglines) which is always being updated, because then if you are lucky a proportion of your users won't be at their computers during the outage so they won't ever see the error.

I really don't care if a site had a 5 minute outage at 3am when I was asleep. But if when I wake up your site is still showing the error message until I hit reload then you just greatly magnified the impact.

Simon on May 20, 2008 05:10 AM

The days of meaningful error messages in UI's are probably dead. Run any security scanning software against an application that gives the user more than what Twitter is giving and you will get alerts. Why does the user care what the problem is?

Roland on May 20, 2008 05:13 AM

1. If the error page gives technical details about what's going wrong, there's the potential of giving people information they need to hack into the site. This would be a bad thing.

2. If it's scheduled maintenance they should say so. Otherwise, by definition, it's an unexpected error and they don't know when it will be fixed.

3. A separate site giving information about the general status of the main site would be useful, but would have to be both physically somewhere else and also on a separate domain.

rfunk on May 20, 2008 05:20 AM

[quote]The thing is, this downtime was scheduled wasnt it?! I did get a flash of a warning on the web saying that it would be down for an hour or so for maintenence..

Put simply, if they had a page up saying "down for maintence" wouldnt that be a whole lot more positive than "technically wrong"?

[/quote]
There's two sides to this coin... some public servers I use tell you (proudly) that it's down for maintennance whenever the server stops responding - including long query results!

Brandon on May 20, 2008 05:38 AM

"Is this an ephemeral, temporary error or some kind of scheduled downtime? How do I tell the difference?"

My guess would be, that, if it was a scheduled Downtime, it would not be called "something is technically wrong". So, it's an error.

"Is Twitter down for everyone, or just me?"

Is there a reliable way in telling that automatically? It has to be very reliable, because a single error here will jeopardize the complete trust which could be build by telling the user if it's just his account.
So, better be quiet before lying, log the error, and fix it asap, that the user hopefully has never to see this page again.

"Is there a place I can go to check Twitter's current system health?"

I'd guess no, they would be telling, if they had. Do you want them to explicitly state nonexisting features?

I personally dont care why there's an error, i just want it not to happen. I don't care for explainations, i even dont remember them 5 minutes later.
So, for me, dont explain what went wrong, just admit it, solve it and i am happy customer.

keppla on May 20, 2008 05:42 AM

I don't think some of you were around when LiveJournal first came out, started growing, and experiencing crashes/overloads daily. They ended up establishing a "status.livejournal.com" domain, on a seperate network and server, to communicate exactly what technical issue was going on, and when the ETA was to resolve it. Granted, you'd have to go to this domain manually, but it was better than not knowing what was going on at all. They would even tell you really specific stuff, like "Hardware RAID Controller failed, swapping out", etc...

If you ever want to read a cool history from Brad Fitzpatrick, who created LiveJournal, on how he dealt with scaling issues, here goes:

http://www.danga.com/words/2005_oscon/oscon-2005.pdf

Nicholas on May 20, 2008 05:43 AM

YouTube is also notorious for bad error management. They always manage to break something with every site update which results in numerous complaints (in video format). Now they've broken their comment system with a simple JavaScript error. You can't report this problem or expect it to get fixed any time soon.

Robert S. Robbins on May 20, 2008 05:49 AM

* Is this an ephemeral, temporary error or some kind of scheduled downtime? How do I tell the difference?
~ Is it really so important to know what kind of error it is? The average user knows that it broke; this is generally enough. I dont think users will be contacting Twitter engineers to provide stack information.

* If this is scheduled downtime, when will it be over? Can I view the maintenance schedule, or the current status of the maintenance work?
~ With the growing pains they are experiencing, would you want to commit yourself to a time? What average user wants details about the maintenance schedule?

* Is Twitter down for everyone, or just me? Is there a place I can go to check Twitter's current system health?
~ See above

* Twitter has a reputation for unreliability. Where can I find out about Twitter's ongoing efforts to improve their reliability?
~ To me this is only sensible point out of the four, Jeff. As a Twitter user, you want to know when you can expect stability and reliability going forward. As consumers and users, an answer is deserved.

But the other points for users are completely uselsss. Users want working applications. When they dont work, they get frustrated, annoyed, irritated. I dont think they care much about the maintenance schedule. Sure, its nice to know WHY the service is down, or even down repeatedly, but after a while, despite all the warm-and-fuzzies and pretty error images, the average user will just give up.

Ordinary Geek on May 20, 2008 05:57 AM

It's obvious the website crashed because the robo cat lost his hand.

Joe Beam on May 20, 2008 06:09 AM

[quote]
"I don't mean to pick on Twitter"
[/quote]

For not meaning to pick on them, you're doing a pretty good job of picking on them...

~Sticky

StickyWidget on May 20, 2008 06:19 AM

Rember Friendster? I think they must have run into scaling issues, as I remember their site getting incredibly slow, often getting errors loading pages (just the default "page can't be displayed" errors though). That's probably the big reason MySpace and Facebook are huge while Friendster is an afterthought.

Kris on May 20, 2008 06:26 AM

You missed a few other important things users should be told in a service error message:

* Did what I just tried, work? (Did my e-mail send? Purchase go through? Changes commit? Pictures post? Blog update?)

* Do I have to do it again?

* Should I try again right now, or later? How much later?


Clinton Pierce on May 20, 2008 06:37 AM

Twitter - is that thing still around?

No offense, but anyone who considers that to be an "essential service" has some serious organizational problems. That's worse than the people gasping for air and flailing about wildly when they can't get to MSN messenger or Facebook.

I know that's not really on topic, but I think it would have been a more effective delivery if you'd left out that bit.

Aaron G on May 20, 2008 06:50 AM

* Is this an ephemeral, temporary error or some kind of scheduled downtime? How do I tell the difference?
~ Is it really so important to know what kind of error it is? The average user knows that it broke; this is generally enough. I dont think users will be contacting Twitter engineers to provide stack information.
** Is it broken and will be fixed later, or just broken for a moment and I can just reload?

* If this is scheduled downtime, when will it be over? Can I view the maintenance schedule, or the current status of the maintenance work?
~ With the growing pains they are experiencing, would you want to commit yourself to a time? What average user wants details about the maintenance schedule?
** Is this an error or just the service offline, should I keep retrying see above

* Is Twitter down for everyone, or just me? Is there a place I can go to check Twitter's current system health?
~ See above
** See above ...

Do I care why it is not working - no
Do I want to know if it is a temporary fault, or just for me, (and I can just restart/reload) or a system wide fault, offline for maintainence, and so I should give up and come back later... yes!


Jaster on May 20, 2008 07:19 AM

The Twitter error page is very reasonable. As a user I don't need to be overwhelmed with details. I could give a rip what happened, but it's nice to know that the problem is on their end and not mine.

PaulG. on May 20, 2008 07:20 AM

I think it's an AWESOME error page. If only I could get away with it because my apps had so many users.

Clouds, birdie, dismembered robot? Pure GENIUS!

The whole point is, you DO NOT NEED to know anything more, because you WILL BE BACK, you are a Twitter junkie!

steve on May 20, 2008 07:34 AM

I'm happy with seeing the error page. I am happy when I see ANY page served from twitter.com, because Lord knows how unstable it is! I've also gotten a very plain and ugly 500-Internal Server Error from Twitter before. They really need to get their act together.

Josh Stodola on May 20, 2008 07:39 AM

I would agree with your assessment that their error page does need a small bit more information. I get that same screen once in a while and usually after refreshing the page, the error is gone and I can view the page.

But, why should Twitter, or any other site, provide me with the technical information on what happened on the back end when error occurs? They don't owe me anything. I don't pay for the service. If I did, I'd probably be complaining also.

As far as logging the error and someone at Twitter fixing the problem, they have a page for jobs (http://twitter.com/help/jobs). You should apply.

russell.turley@gmail.com on May 20, 2008 07:57 AM

Twitter - An "essential service"?

Please... an essential service is a service that without which one could to a degree not live without.

If the police or firefighters or EMT, were to strike or in context to this discussion (*crash*, *downtime*), it would not matter whether they gave warning or not; the chaos that results would be the same. Any removal of their services at any time in any way would be disastrous.

That is what makes a service essential. Twitter -- is definitely not.

Aaron on May 20, 2008 08:20 AM

> Is Twitter down for everyone, or just me? Is there a place I can go to check Twitter's current system health?

Funny thing was just about to post a comment about istwitterdown.com when I noticed that it was down now *sigh*

Aaron Bassett on May 20, 2008 08:37 AM

I'm afraid that I have to agree with Aaron in some respects. Whilst Twitter is obviously not essential for anyone I still cannot see why people in tech-circles still go on and on about this website. I have really tried to see some sort of use for Twitter, and I mean really tried!

That being cast aside, it is usually a good idea to have some sort of system health checker so you know whether "all" of a website is down before you keep trying.

Mike on May 20, 2008 08:41 AM

Ah, the old "CloudsRobotSeveredRobotHandBird" error. I used to get that all the time when I was a kid. If I recall correctly I believe it was error #3.1415.2.7182818. Haven't seen it for years.

/ smirk /

JeffH on May 20, 2008 09:06 AM

I think Twitter will be bought for an over bloated price and than disappear because people will realize how utterly useless this service is.

Donny on May 20, 2008 09:14 AM

Does the user need to see anything technical?

When my company's web site has an error, the web page actually sends an email to support with all the technical information necessary.

So while we capture the information behind the scenes, we just show the user a message saying we're performing system maintenance and should be back momentarily.

ProudGeekDad on May 20, 2008 09:49 AM

As a side note, I recently started using Twitter. I definitely put myself in the "what's the point" category before I used it.

Now that I've used it, I think it's a great tool to log what you're doing when you're doing it. You can go back later on and analyze what you've done, where you've spent too much time, etc.

I wish I could get to it from via email as the web site is blocked by WebSense. :(

ProudGeekDad on May 20, 2008 09:54 AM

this can only last for so long, twitter is the worst social network regarding uptime, and being honest about it; the only thing it has is popular people "backing" it up like this blog... but that can only last for so long... we'll see, there are already a few other services that actually work that do the same thing that twitter tries to do, only... they actually do it

Eber Irigoyen on May 20, 2008 10:00 AM

What's with the robot giving you the finger?

Marcus Stade on May 20, 2008 10:26 AM


I think it a bit presumptuous to be asking for things like access to maintenance schedules and reports, etc. You are using a free, non-critical service, not say, running your business with a managed hosting contract that has SLAs and the like.

As to the service itself, I find it amusing that people feel the need to prattle about their every insipid thought and activity. More so, that others are eager to lap up such drivel. Granted, you can argue that it is useful for communicating with colleagues and friends, but with email and instant messaging, Skype, etc. is this really necessary? Or as mentioned, that people want a log of their minutiae throughout the day?

I see twitter as a narcissistic feedback loop amongst the self-important, and/or another obsession with "information" that reminds me of the dawn of the Web, where people's first exposure was one of awe and they would go from site to site for hours on end. In this case, however, there is far less substance to "tweets".

Eh. Perhaps I am too judgemental, but whatever... People are what they are.

Foo Kung on May 20, 2008 10:29 AM

First people touching your screen and now this? Come on Jeff....

Broham on May 20, 2008 12:44 PM

Jeff,

Thinking about crashing responsibly is thinking about the problem from the wrong end. When application dies, it remains dead until fixed/rebooted.

Real question here is why application die in the first place:

1. Overzealous memory consumption. Even in the age of "automatic" memory management, badly written application will crash as soon as enough memory is leaked

2. Scalability as an afterthought. This one is closely related to problem #1

3. Ridicuolous volume of I/O, stemming from bad practices of n-tier application design (chattiness, crappiness...)


Recoverable errors are a completely different matter of course. Their causes and resolutions can be described and communication to both users and technical staff.

Not like IE, which restarts itself hoping that user won't even notice the crash.

Error conditions are like unit tests. If you cannot imagine a recovery from an error (or at least a decent response), you are probably dealing with a potential "crash scenario."

BugFree on May 20, 2008 12:54 PM


I think because you're a developer, you're asking for and expecting too much information. Even if they know what the problem is, why do they need to tell you the exact reason? If a hard drive crashed, NIC smoked, network issues, do you expect them to mention these reasons? It doesn't resonate well with customers.

The most important piece of info I need is an ETA. This way I don't bother retrying before that time and I have some idea of how severe the problem is. This will also keep people from contacting the company with the nagging question, "When do you expect to be back up?". It's very irritating and a waste of time when a receptionist or a highly paid engineer reply with the same answer.

Also in some cases, an engineer or more are frantically busy trying to figure out what the problem is and fixing it, than trying to update a status or error page.

That error page you're showing doesn't imply a maintenance. Maintenance is usually scheduled and known and not "something is wrong".

Abdu on May 20, 2008 12:54 PM

Could you please not mention twitter so often? I think it's the most annoying thing since crying children =)

greg on May 20, 2008 01:06 PM

You do realize downforeveryoneorjustme is ... well ... down, right?

Andrew Badera on May 20, 2008 01:09 PM

if twitter is becoming so critical for users, I'm sure they could charge for it. In fact, given their current technical woes they could be in the dangerous position of pricing themselves out of business, instead of pricing themselves out of the market. I don't know what business model they would use, perhaps unlimited tweets from the web, but only 10 tweets per day from the API? Then again I could be wrong. One day Google seemed to only get money in by selling a few search appliances, the next it was the world's biggest advertising platform and a license to print money.

And Broham, touching screens (except the iPhone) is a legitimate subject to blog about. I'm not cold-hearted enough to want to make it a crime punishable by jail, or being poked randomly with sticks covered in lard, but screen-touchers need to know they are not normal.

John Ferguson on May 20, 2008 01:12 PM

It seems to me that time is ripe for a diaper change at Twitter.

What, no Frankenstein on the error page?

Users should think happy thoughts until twitter-team reboots or gets back on track with rails (sic!)

BugFree on May 20, 2008 01:13 PM

just had a thought. must read the twitter terms of service and privacy policy. wouldn't want them to be making money from selling my tweets...

John Ferguson on May 20, 2008 01:15 PM

What are your thoughts on felonious silence from Google during the 7+ weeks that Google groups was down?

dshorter on May 20, 2008 01:20 PM

One website I visited had a very simple wiki page for its error page. This wiki was hosted on a different server with a different domain name. When the primary site was down people could leave notes about what was working, what wasn't, if there were any known work-arounds and of course developers could add quick updates. I think the most important thing was that it gave the community a chance to hang out and chat rather than be completely severed by the programmers mistakes.

mccoyn on May 20, 2008 01:33 PM

Perhaps the error page reflects the attitude of the company running the web service. You can literally imagine how that company works: In an event of a server disaster, the first thing is probably the server guys reporting to the head, while attempting to fix it "right away". And as it usually turns out, the fix takes longer than the technicians think.

Unless the company is one cohesive team, the technician is usually afraid to give time estimates, as it will either be a wrong prediction or a conversative prediction, which leads to loss of confidence in him either way.

On the other hand, the attitude of "not claiming responsibility yet slightly blaming the user" is very Apple-ish. The whole team in the company might really think that the users are at fault. And in this case, even if the web manager is eager to update the status of the crisis, he might not be permitted to do so.

Nevertheless, I agree that they should be more transparent about the errors, especially when users are starting to lose confidence. During good times perhaps they can just smug and brush the error behind, but now it's a bit ironic that the Twitter guys don't use Twitter themselves to report on their status.

By the way, Twitter loses a lot of money with SMS. They have yet to find a way to earn money. In the times of a recession, services that are yet to be monetized will probably go first.

Pak-Kei on May 20, 2008 01:38 PM

> I think the most important thing was that it gave the community a chance to hang out and chat rather than be completely severed by the programmers mistakes.

That's awesome -- exactly the kind of thing I'm talking about.

I don't frequent Digg, but I happened to visit there recently and they were down for maintenance. Rather than present a generic "we're down for maintenance, kthxbye" page, they provided a giant list of "recommended links" from each Digg employee.

Jeff Atwood on May 20, 2008 02:46 PM

I can tell you that the only thing that drives me more nuts then when a server goes down or an important site goes down, is when that site goes down with no explanation of the fall out.

It happened to me yesterday with my server provider...I actually had to call them up before I found out they were scheduled for maintenance (at 10 am no less!). Drives me nuts!

Nate Nead on May 20, 2008 02:48 PM

Psychologically speaking, it can be better to make a user think it's not just their problem.

When people feel they're just one of many being affected, it lowers their individual expectations and they just wait it out. Most people will automatically think they're not so special, and if it's affecting everyone, then surely the company is on the case.

If I know it's just me that's having issues, I want resolution and I become much more vocal and chatty, because nobody's going to fix it if I just sit there.

However, if you're doing it all the time, your users will start to cotton on. BT used to do this, citing 'your entire exchange is affected and an engineer has been assigned to the issue as we speak'. It was all bollocks though. If you have problems often, and they use this excuse, it starts to backfire and you think their infrastructure is crap. In my case it turns out it was a faulty BT telephone in my flat, but because of all their BS excuses like 'your entire neighborhood is affected' and 'a problem at the exchange' I never thought to check. Aw shucks.

PsychologicallySpeaking on May 20, 2008 02:52 PM

A downtime page isn't an error page. It'd be nice if this was informative, but typically the site's down. How do you make it informative? Well you write code, but if the site is down, then what? Hit the database for the info, but that's down. In change-controlled environments, you can't exactly just go modifying files because change control managers start to turn purple. There's a reason 404 and 500 pages are static html with inline styles, and they're all surprisingly generic.

The alternate status website is a good idea.

A generic error page has the same issues as a downtime page. Sometimes you don't know why you got there, and the worst thing an error page can do is error out. Again, they're typically static html with a 'very sorry' message because they assume if a user's reading it, something very unexpected has happened.

No user needs to see voluminous crappy stack traces. Security-wise you want to hide the information, as a l33t haxx0r can use that information to subvert the site. You can redirect to the generic error page in a number of circumstances, even when there's no real error (eg detecting a user is manually altering query strings).

500ServerError on May 20, 2008 02:58 PM

Is this the Twitter Junkies Hell?

When Twitter is down we all gather here to Bash the Shit out of it! LOL

Igor The Troll on May 20, 2008 03:35 PM

I've always thought they could be a little more specific in their error messages. I appreciate Digg.com's error/down messages because they generally list a reason and I find their links to other things to do endearing.

All the talk about Twitter's downtime is too much. I'm sure they'll get scaling down fine and in the meantime maybe everyone can do something else. It's like creating a new subdivision that gets super popular super fast. The plumbing and electric weren't up to meet the demand and it's a pain for everyone as it's built out. People should stop complaining...especially considering it's FREE!

john on May 20, 2008 03:52 PM

It is a shame that many people seem to have missed Jeff's point. Jeff isn't saying that the user should be shown what went wrong (he assumes this is done behind the scences as it should be). He is arguing that the user should be given more information that is useful. Like how long to wait before coming back.

Many responses here come from an inside out development viewpoint, with people thinking about what the developer needs, rather than what the user needs. Yes it might be hard to provide the reassurance and information a user needs, but it is vital to do so.

Also consider that Jeff was using a recent Twitter crash? to illistrate his point.

And yes twitter is free, and yes you shouldn't expect too much, but Twitter could never charge for a service when they have an availability issue. Just because you are offering something for free doesn't mean that you don't strive to be better. If Firefox or Apache didn't "strive" to be better they wouldn't be as widely used today.

Alasdair

Alasdair on May 20, 2008 04:27 PM

On webpages it makes some sense not to print out the exact error because:

1. Since the application is running server-side, the site developers can get the exact error from their log files. No user error reporting is necessary.
2. Printing out error details is a security risk because hackers will be able to recognize that something that they did was able to crash the application. This could be their starting point for further investigation.

As far as I am concerned, I would say this error page is ok. As a user there is nothing I can do at that point except reload the page or come back later.

Pradeep on May 20, 2008 05:01 PM

Hey, look, Twitter changed their error page! Now it says

Twitter is Coming Back Online

For more information on what is happening and to follow the discussion visit Twitter on Get Satisfaction.

<a href="http://getsatisfaction.com/twitter/topics/may_20_twitter_downtime">http://getsatisfaction.com/twitter/topics/may_20_twitter_downtime</a>;

The Twitter Team

Jeff Atwood on May 20, 2008 05:12 PM

500ServerError:
I can't see why any error page other than 503 Service Unavailable *has* to be static html.

500 could very well log which page threw the 500 error and what GET/POST things were sent to it.

Powerlord on May 20, 2008 05:45 PM

If you don't like that error page then I bet you wouldn't have liked the old Twitter error pages:
http://www.uie.com/brainsparks/2007/06/04/twitters-fairy-doors/

Angus on May 20, 2008 06:20 PM

@Powerlord I guess I neglected to mention an assumption that all relevant stack traces, querystrings, viewstate, whatever are logged using a global exception handler at the point of the error (before the user is redirected to the static error page). The static error page is static because it's not doing anything, and there's nothing left for it to do.

My point was building on that assumption: the static error pages shouldn't be doing anything else since they won't necessarily know why they were served (aside from the obvious) and from their point of view, all bets are off.

...

The only time timeframes are needed is scheduled maintenance. It's a planned outage, so it should be straightforward enough. A separate server is a nice idea since you conceivably don't connect to the same data store. But if you're on the same server, you're manually coding up outage files. A good infrastructure would help here (IP load balancer, etc).

Coming up with a magical formula for unplanned outages is silly and a waste of time because you'd be better off fixing the darn problem in the first place.

And finally, managers and coders don't invest a lot of time in error pages (other than making them nice to look at) because I really don't expect my users to ever see them. I've seen people try to get cute, and at the end of the day, the change is rolled back because the error page failed in an unforeseen way, and caused an error, and redirected back to itself. Drat.

500ServerError on May 20, 2008 08:06 PM

uh-oh... someone divided by zero.

Jonny on May 21, 2008 01:18 AM

If they're logging and reporting errors behind the scenes then the error page is not static. Therefore they should be able to show a customised error page.

Josh on May 21, 2008 02:37 AM

The Web2.0 world has to take some lessons from the core hardware world. A software service is today no different from a HW chipset. It is used by as many millions as a chipset would. Just because we can cook up a great idea with half a dozen simple scripts, it doesnt mean we can escape the responsibility of writing scalable, crash free code.

prabindh on May 21, 2008 03:36 AM

@Josh - A somewhat trite response that indicates you probably don't know what you're talking about (or at least how these things work) for web applications. I guess you're missing the point that static error pages happen /after/ everything else.

- you've already checked input
- you've already accounted for /known/ issues (DB concurrency exceptions, no connection to DB, web services down, data parsed appropriately, checking references in methods, etc).
- you write good error handling code that only catches specific exceptions

But the unimaginable happens. Timeouts, db connections that die, drives that crash, overloaded web servers, no memory, full disk space, file permissions don't work, etc.

But you've got a global error handler. It's already intercepted the message and done it's best to notify someone in operations. SNMP traps, emails, log files, etc. Yay. Mr 24x7 in operations is on the job and degaussing servers like crazy. Job well done. Have a kit-kat man!

:-P

So what exactly do you want the error page to say? Some technical jargon? A stack trace? That node SN14 decided it's no longer in sync with cluster NN233? That it's been fed some bad data? That drive C is full?

Lordy, I guess I can sleep better now. I never did like NN233 (always smelled bad in school).

Tell us, and please be honest, did your mother appreciate the arcane message?
Does she feel better?
Does it help her?

Hmm, that begs the question: How do you customise an error page when you don't expect an error to happen? I guess you really need a time machine to do that properly. Then you can make a fancy error page that really knows how to display just the right message when, on Jan 14 2020 at 4:45pm PST node SN14 desyncs with cluster NN233. Feel better? I do.

You know what's too bad? We ran out of time with all this gee whiz time travel and fancy error page stuff and never did fix the code in node SN14 that stops the error from happening in the first place.

Then again, us mere mortals just chuck up a static page that says 'Sorry' because we're busy fixing bugs. Some of us are a bit more graphically inclined (or have some senses of humor) and like to put up cute pages so the customer experience isn't diminished too much.

pepper on May 21, 2008 05:04 AM

Every time I get this error page I misread the beginning as "Thanks for nothing."

Also, what slays me about almost all of the downtime for the last year is that Twitter almost always attributes the downtime to being the result of installing enhancements to increase future stability. If this is the case, they need to do much more due diligence in testing and planning for the installation of these enhancements. It also means that the downtime is not, per se, the result of scaling problems or immediate failure of the existing software infrastructure.

Michael on May 21, 2008 12:24 PM

Uh, giving too much info on crashes can give hackers too much information!

They should/probably are logging these errors in detail, privately, exactly as I would.

Alan Hogan on May 21, 2008 03:43 PM

Ruby on Rails doesn't scale well... Develop it with another language. Dare I say it on this blog... Develop it with PHP, it scales.

Tom on May 22, 2008 06:57 AM

That error page is about as useful as Twitter is.

But seriously, if you spend a great deal of time obsessing over what you friends are doing and keeping them updated on the status of your day-to-day life, you don't need a better error page. Your message of "Going to the ATM" or "Updating my Facebook picture" just won't get out to the rest of the world.

And is that such a bad thing?

Kevin Raffay on May 23, 2008 12:18 PM

No, you all got it totally wrong. If it can send you a cute picture, it hasn't REALLY crashed, get it? A real no-nonsense hardcore crash means BLAM! your browser vanishes!

Clever guy on June 6, 2008 12:55 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.