I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

January 29, 2006

Not All Bugs Are Worth Fixing

triage-tag.png One thing that continually frustrates me when working with dedicated test teams is that, well, they find too many bugs.

Don't get me wrong. I want to be the first person to know about any bug that results in inconvenience for a user. But how do you distinguish between bugs that users are likely to encounter, and bugs that users will probably never see?

The first thing you do is take that list of bugs from the testers and have yourself a triage meeting:

The term "triage" was borrowed from medical triage where a doctor or nurse has to prioritize care for a large group of injured people. The main job of a software bug triage team is to decide which bugs need to be fixed (or conversely, which bugs we're willing to ship with).

Eric lists four questions that need to be answered during triage to decide whether a bug should be fixed or not:

  1. Severity: When this bug happens, how bad is the impact?
  2. Frequency: How often does this bug happen?
  3. Cost: How much effort would be required to fix this bug?
  4. Risk: What is the risk of fixing this bug?

Triage isn't exactly my idea of a good time. But you have to do it, because you'll always have far more bugs than you have development time. Nobody has the luxury of fixing all the bugs in their software.

Testers produce two kinds of bugs:

  1. A small subset of very serious bugs that everyone can immediately agree on. These are great. They're the kind of catches that make me thank my lucky stars that we have dedicated testers. You go, girl-slash-boy!
  2. Everything else. A vast, gray wasteland of pseudo-bugs that nobody can really agree on. Is it an inconvenience for the user? Would users really do things this way? Would a user ever run into this? Do we even care?

It's a clear win for the bugs everyone agrees on. That's usually about ten to twenty percent of the bug list in my experience. But for everything else, there's a serious problem: testers aren't real users. I'd give a bug from a customer ten times the weight of a bug reported by a tester.

The source of the bug is just one factor to consider. Bug triage isn't a science. It's highly subjective and totally dependent on the specifics of your application. In Bugs Are a Business Decision, Jan Goyvaerts describes how different triage can be for applications at each end of that spectrum:

Last July I flew to Denver to attend the Shareware Industry Conference. I flew the leg from Taipei to Los Angeles on a Boeing 747 operated by China Airlines. This aircraft has two major software systems on board: the avionics software (flight computer), and the in-flight entertainment system. These two systems are completely independent of each other, developed by different companies, to different standards.

The avionics software is the software that flies the plane. No, the pilots don’t fly the plane, the flight computer does. How many bugs would you tolerate in the avionics software? How many do you think Boeing left unfixed? How many people have ever been killed by software bugs in modern airliners? Zero. A flawed flight computer would immediately ground all 747s worldwide. Boeing would not recover.

The in-flight entertainment system is a completely different story. It’s not essential to the plane. It only serves to make the passengers forget how uncomfortable those economy seats really are. If the entertainment system barfs all over itself, the cost is minimal. Passengers are already out of their money, and most will choose their next flight based on price and schedule rather than which movies are on those tiny screens, if any. I was actually quite pleased with Chine Airlines’ system, which offered economy passengers individual screens and a choice of a dozen or so on-demand movies (i.e. each passenger can start viewing any movie at any time, and even pause and rewind). That is, until the system started acting up. It locked up a few times causing everybody’s movie to pause for several minutes. Once, the crew had to reboot the whole thing. That silly Linux penguin mocked me for several minutes while the boot messages crept by. X11 showed off its X-shaped cursor right in the middle of the screen even longer. Judging from the crew’s attitude about it, the reboot seemed like something that’s part of their training.

Bugs also cost money to fix. In My Life as a Code Economist, Eric Sink outlines all the decisions that go into whether or not a bug gets fixed at his company:

Don't we all start out with the belief that software only gets better as we work on it? The fact that we need regression testing is somehow like evidence that there is something wrong with the world. After all, it's not like anybody on our team is intentionally creating new bugs. We're just trying to make sure our product gets better every day, and yet, somewhere between 3.1.2 and 3.1.3, we made it worse.

But that's just the way it is. Every code change is a risk. A development cycle that doesn't recognize this will churn indefinitely and never create a shippable product. At some point, if the product is ever going to converge toward a release, you have to start deciding which bugs aren't going to get fixed.

To put it another way, think about what you want to say to yourself when look in the mirror just after your product is released. The people in group 2 want to look in the mirror and say this:

"Our bug database has ZERO open items. We didn't defer a single bug. We fixed them all. After every bug fix, we regression tested the entire product, with 100% code coverage. Our product is perfect, absolutely flawless and above any criticism whatsoever."

The group 1 person wants to look in the mirror and say this:

"Our bug database has lots of open items. We have carefully reviewed every one of them and consider each one to be acceptable. In other words, most of them should probably not even be called bugs. We are not ashamed of this list of open items. On the contrary, we draw confidence from this list because we are shipping a product with a quality level that is well known. There will be no surprises and no mulligans. We admit that our product would be even better if all of these items were "fixed", but fixing them would risk introducing new bugs. We would essentially be exchanging these bugs which we find acceptable for the possibility of newly introduced bugs which might be showstoppers."

I'm not talking about shipping crappy products. I'm not suggesting that anybody ship products of low quality. I'm suggesting that decisions about software quality can be tough and subtle, and we need to be really smart about how to make those decisions. Sometimes a "bug" should not be fixed.

To me, triage is about one thing: making life better for your users. And the best way to do that is to base your triage decisions on data from actual usage -- via exception reporting, user feedback, and beta testing. Otherwise, triage is just a bunch of developers and testers in a room, trying to guess what users might do.

Posted by Jeff Atwood    View blog reactions

 

« VSLive! 2006 Presentation Magnification »

 

Comments

I didn't get into it in the post, but the "how do you define *bug*?" question always comes up at some point.

I found this "When is a bug a bug?" discussion at the JOS forums very interesting:

http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=72875

Jeff Atwood on January 30, 2006 03:39 AM

> How many people have ever been killed by software bugs in modern airliners? Zero.

I'm not so sure. Airbuses in particular seem to have quite a few glitches.

For example: http://seattletimes.nwsource.com/html/opinion/2002078977_gaillard02.html

Phil on January 30, 2006 08:00 AM

Another nice entry, Jeff. Man, when do you find the time, between gaming and coding, to do such a great job with regular updates?

Triage, is in fact, what my last boss used to call it when we got new user requests and bug reports. It's a gruesome necessity in a medical ward overwhelmed by patients and seems to me to naturally apply to the "common sense" project management in any other field.

Bernard Dy on January 30, 2006 03:32 PM

Those not-to-be-fixed-now bugs come in handy:
- for the next release; you can decide to fix some in a less-critical phase
- for a developer who is new to the code; it gives him/her a good opportunity to get to know the product source

Jan

Jan Doggen on January 31, 2006 04:54 AM

Jeff,
The problem you have with testers is the same problem testers have with Product Management and Customer/Support: Bad requirements and poor expectations. Your triages should not be as painful as you make them sound. True, QA people are not actual users, we're smarter. If you and your QA people recieve proper req. doc. then the test cases should be pretty straight-forward as well as the defects. However, when we recieve the doc there are no prioritization lists so we test the main components and then work our way down to the minutae.

You have a right to be annoyed with the triage process and decisions made by management, but I don't think it's fair that you give QA a hard time for doing it's job. Rarely, do people come up to the QA team and say, "Good catch, nice work finding that bug." It's more like, "Why didn't you find this bug or that bug and what were you testing?" 90% of the time QA doesn't necessarily get to make those judgement calls about what MUST get fixed because business doesn't care if we're not happy putting our names on a defective product.

All of my QA teams perform internal team reviews before we escalate to dev/PM triage and that liimits the time wasted in those meetings. We don't even want to be there, we have enough meetings about Priority/Severity/etc. Dev and PM need to hash out what needs to be fixed and by when. We just hope we're given enough time to test fixes again before the product goes out.

Just one testers story.

Damien on January 31, 2006 11:43 AM

Damien,

I don't mean to sound anti-tester; rather, I am pro-user. Testing isn't a single activity-- it's a gauntlet of events your app has to make it through. A traditional test/qa is definitely an important section of that gauntlet.

But not the *only* section.. and all decisions made during triage should be informed with real user data whenever possible.

Jeff Atwood on January 31, 2006 11:52 AM

Great article jeff, however I think more needs to be said about the "attitude" both lone and company developers take when it comes to handling bugs in their software. Lately it seems as if developers have more and more tolerance for bugs, relying on the old addage that, "It's impossible to create bugless software!" And while this may be true to some extent, by relying on this as an excuse or reason more and more bugs are left unchallenged or at least until some future date/version! I see this a lot in third party component developers for Delphi and even more in popular shareware applications. Programmers need to understand that their prime directives need to include their making every attempt to prevent and solve bugs through the use of systematic and diligent testing methods, regression testing after each fix, and ensuring that releases can be patched or updated easily as the need arises so that the consumer, whether another developer or an end-user, doesn't have to wait forever for something to be done! The techniques used for finding and fixing bugs is obvious to all developers by now, so attitude is what needs changing!

Robert Meek on February 3, 2006 11:22 AM

Severity and frequency should be attributes of every bug the testers write.
WRT bugs submitted by users, many of them are actually requests for unscoped new functionality. Dont forget to quantity the cost and risk-to-project-plan of 'fixing' unscoped new functionality.
Cheers
Mike Tierney

Mike Tierney on February 20, 2006 12:17 PM

If there is a bug, then should be asked, why it is there. If there are hundreds of minor not-fix-now-maybe-later bugs, then the real bug is not in the software. It is in the engineering process. And that bug should be fixed.

If there is a danger of creating more bugs by fixing one bug, then should be asked, why the danger exists. If programmers can not program any code, because they have to fear that more bugs start popping up from some mysterious shadows, then there surely is something wrong with the engineering process. And that is real critical and should be fixed.

Don on September 7, 2006 10:48 AM

My take on enhancements that testers call "bugs"

It's like a person that wants a nose job walking into triage....

chamberland on December 4, 2007 12:52 PM

That attitude of Team 2 (accepting bugs) really annoys me, not only as a programmer, but especially from a user's perspective. I prefer having less features and less bugs, than somthing that comes up with errors when I use it. And we are talking only about reported bugs, not all the bugs. Would you tolerate any (even non-critical) bugs in your washing machine, or your refrigerator?

I think that this "users can live with this bug"-attitude is the reason that todays software is of such a poor quality. It is this attitude, which causes the software industry to be called immature. The users should demand more, but also pay more for the better quality.

Ivan Dolvich on December 5, 2007 12:26 AM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.