You know the feeling. It's happened to all of us at some point: you've pored over the code a dozen times and still can't find a problem with it. But there's some bug or error you can't seem to get rid of. There just has to be something wrong with the machine you're coding on, with the operating system you're running under, with the tools and libraries you're using. There just has to be!
No matter how desperate you get, don't choose that path. Down that path lies voodoo computing and programming by coincidence. In short, madness.
It's frustrating to repeatedly bang your head against difficult, obscure bugs, but don't let desperation lead you astray. An essential part of being a humble programmer is realizing that whenever there's a problem with the code you've written, it's always your fault. This is aptly summarized in The Pragmatic Programmer as "Select Isn't Broken":
In most projects, the code you are debugging may be a mixture of application code written by you and others on your project team, third-party products (database, connectivity, graphical libraries, specialized communications or algorithms, and so on) and the platform environment (operating system, system libraries, and compilers).It is possible that a bug exists in the OS, the compiler, or a third-party product-- but this should not be your first thought. It is much more likely that the bug exists in the application code under development. It is generally more profitable to assume that the application code is incorrectly calling into a library than to assume that the library itself is broken. Even if the problem does lie with a third party, you'll still have to eliminate your code before submitting the bug report.
We worked on a project where a senior engineer was convinced that the select system call was broken on Solaris. No amount of persuasion or logic could change his mind (the fact that every other networking application on the box worked fine was irrelevant). He spent weeks writing workarounds, which, for some odd reason, didn't seem to fix the problem. When finally forced to sit down and read the documentation on select, he discovered the problem and corrected it in a matter of minutes. We now use the phrase "select is broken" as a gentle reminder whenever one of us starts blaming the system for a fault that is likely to be our own.
The flip side of code ownership is code responsibility. No matter what the problem is with your software-- maybe it's not even your code in the first place-- always assume the problem is in your code and act accordingly. If you're going to subject the world to your software, take full responsibility for its failures. Even if, technically speaking, you don't have to. That's how you earn respect and credibility. You certainly don't earn respect or credibility by endlessly pawning off errors and problems on other people, other companies, other sources.
Statistically, you understand, it is incredibly rare for any bugs or errors in your software not to be your fault. In Code Complete, Steve McConnell cited two studies that proved it:
A pair of studies performed [in 1973 and 1984] found that, of total errors reported, roughly 95% are caused by programmers, 2% by systems software (the compiler and the operating system), 2% by some other software, and 1% by the hardware. Systems software and development tools are used by many more people today than they were in the 1970s and 1980s, and so my best guess is that, today, an even higher percentage of errors are the programmers' fault.
Whatever the problem with your software is, take ownership. Start with your code, and investigate further and further outward until you have definitive evidence of where the problem lies. If the problem lies in some other bit of code that you don't control, you'll not only have learned essential troubleshooting and diagnostic skills, you'll also have an audit trail of evidence to back up your claims, too. This is certainly a lot more work than shrugging your shoulders and pointing your finger at the OS, the tools, or the framework-- but it also engenders a sense of trust and respect you're unlikely to achieve through fingerpointing and evasion.
If you truly aspire to being a humble programmer, you should have no qualms about saying "hey, this is my fault-- and I'll get to the bottom of it."
| [advertisement] Dashboard for Data Dynamics Reports introduces new controls designed to create dashboards that inform without wasting space or confusing users. |
Posted by Jeff Atwood View blog reactions
« Adventures in Rechargeable Batteries Paul Graham's Participatory Narcissism »
So what do you do when you are alone?
You've tossed and turned at night, days have gone by... And you are burnt out.
brian on March 20, 2008 10:56 PMthis is great reminder, but with the *now* programming culture most of us working with beta stuff sometime alpha and not released bits so depending on the current programming culture the percentage you stated from Code Complete is lower - you should take in account what lib and tools developers are working with today.
Adel on March 20, 2008 10:59 PMInteresting enough, the exact opposite happened to me. I was building an application using Java 1.6.0 Update 3, and I kept having odd behavior with the JComboBoxes for the GUI (you could only select the first item on the list after selecting a later item, which causes problems when you only have one item in the list!). Needless to say, it was very odd behavior.
Being a new developer, I made the assumption it had to be in my code (it was my first application using Java's Swing package). After trying every card in the book, I ran it in Java 5, and it worked perfect! There was a bug in Java 6 with the GUI.
Anyway, my number one rule in programming: Never Assume Anything!
Robert on March 20, 2008 11:05 PMWhich leads to my favorite thing to say to the QA people: "well, it works on my machine..."
Nice post Jeff - brought a smile to my face.
btw, keep us posted on what you're up to professionally these days too!
Johnny Fry on March 20, 2008 11:19 PMI agree with the advice in general given in this blog post and that's what I would advise most other programmers too.
But there have been more than a few cases within the past few years where I blamed myself and my code (I would have been tearing out my hear if I actually grew it long enough to tear at), only to find out after isolating the case that the bugs within some framework or OS bug. I've even had an instance in my previous job where some networking code was broken because of buggy firmware implementation on the hardware layer. And everytime its been some bug somewhere in the deeper layers, it took hours of painful work to find out about it and isolate it.
So although I do generally blame my own code before blaming anyone else's when it comes to tracing bugs, I have learned to keep my eyes wide open for the possibility that it may not be my fault.
I have to agree with Robert above, Never Assume Anything!
Joe Goh on March 20, 2008 11:25 PMTrue.
Niyaz PK on March 20, 2008 11:39 PMs/hear/hair in my post above BTW. Insufficient caffiene intake.
Joe Goh on March 20, 2008 11:55 PMWhereas this advice is certainly true, it's a little unfair. Things like OSes, compilers, libraries, et cetera, define the rules. When creating these backbone type systems, these programmers can simply change the rules to fit what they've coded. Think of the applications that call what look like bugs to be features. "I know that looks odd, but it's *suppose* to look that way!"
So I would modify code complete a little bit... perhaps only 90% are your fault and the missing 5% are bugs that have had the documentation changed to fit the bug.
=D
Frank on March 21, 2008 12:03 AMIt's only programming by coincidence if you tweak things without knowing why. If you're working with a proprietary OS like Solaris 15 years ago, maybe that's how you fix things.
These days, all our code runs on Linux (or maybe OpenSolaris). If select() isn't working right, we open up select.c and check. If your whole stack is open-source, there's no need to program by coincidence.
Of the work I've done the past 5 years, when it looks like a language or library bug, it almost always is. I think the First Rule should simply be "Don't Program By Coincidence". Passing blame is just a corollary, at best.
tim on March 21, 2008 12:16 AMWhile I'm a big fan of Code Complete, and I would generally advise junior devs to always assume the bug is theirs, I think that things have changed a lot since the original advice was given - the frameworks we use (ASP.NET in my case) are huge, and therefore much less well tested/ exercised than Solaris 'select'. I've certainly seen several instances in the past few years of stuff broken in ASP.NET (eg. CheckBox looses its ViewState if sufficiently deeply nested, just to quote the most recent example).
Syd on March 21, 2008 12:35 AMI don't think Jeff was claiming that everything but your code is 100% error free. Of course there are errors in other things; they are, after all, someone else's code.
The point is that you don't go bothering the people who made the other things telling them there are bugs until you have done your best to rule out your own code. It's so tempting to think everything you have done is right and that the problem MUST lie with another component, to then stir up a big fuss, only to later realise that the bug was indeed in your own code and have to go around apologising. I've done it. Quite recently, in fact. :) We've all done it, I imagine. It's an ever-tempting course of action, so it's really important to keep the underlying message here in mind: "It's always your fault." That is hyperbole, of course; it isn't really always your fault, but you have to think it is your fault until you have proven otherwise.
If you think there's a bug in something else then the best thing you can do is strip away as much of your own code as possible and produce an extremely simple bit of code which reproduces the bug. Then it's easy to prove where the fault is.
All too often the problem is that the bug is our own fault and is due to our misunderstanding of how a component works. In those cases it's easy to see why we assume the bug is somewhere else: We look through our code over and over and everything looks perfect, but only because we have an incorrect view of how it should be.
I often find myself staring at the code, often to find no problems, I'll pass it to another programmer, he'll find no problems. 3 days later ill still be banging my head.
Arron on March 21, 2008 01:07 AMI've never read anything more true than this! I've got a friend (co-worker) who always seeks the reason for failure in the 3rd party even if its PHP itself.
That's why our boss says about us:
He is a fighter-coder because he always FIGHTS with the code.
I'm the programmer because I make things work, no matter if the failure was my fault of someone's else.
Anyway. Great post. Thanks
adamsky on March 21, 2008 01:07 AMGreat.. it's my fault again. :)
Walking away from the specific bug and getting a good night's rest is crucial to problem solving.. as I'm sure you know. Staring at the problem causes me to go bug-eyed... refreshing the brain does wonders.
Cheers.
Patrick on March 21, 2008 01:17 AMJeff,
Your blog is always well written and most of the lessons you teach I also experienced before.
A good developer must always be humble! A good developer accepts that he makes mistakes - and learns from them.
David
David on March 21, 2008 01:50 AMMy colleague spent something like two days on a problem that he was convinced was his fault, because he thought .net that was so tested and quality assured possibly couldn't contain a defect. Meself had a more experience on the gui parts of .net and kindly insisted that there may be an error. And after posting on the MSDN forum we found out that, yes, there was a bug.
I agree totally, take responsibility and assume it's your fault. But don't be stupid. GDI/GDI+ for example contains a whole lot of funny things.
Hi Jeff,
I totally agree! It is such a common situation that you describe in the first lines of your post: totally frustrated with a stupid bug that must be in a few lines of code you've been looking at for a long time (which might not even be your code in the first place). Then if you do admit that the fault is in the code itself (and not the programming language, library, OS...), then there still a danger of choosing the wrong solution. Say you made some changes that did the trick, but you don't know why or how those changes solved the issue. But you don't care; you solved it, commit the fix and go on with more interesting work. Obviously this will make things worse instead of better; for the solution too code ownership is code responsibility.
I have a full post on this topic on my own blog: http://www.code-muse.com/blog/?p=15
Btw, the pragmatic programmer is one of my favorites. A must read for every programmer that wants to improve himself.
Gertjan Zwartjes on March 21, 2008 02:20 AMVery good post thank you.
But I don't think being an humble developer should be a goal for any of us. Being humble is indeed good - but it is not a professional goal of mine.
My rule of thumb when dealing with bugs is: "if you blame a component, or anything that you have not written yourself then you have to prove it". Always try to reduce the problem or the bug to minimal example where you can isolate the problem from the rest of your application. If you can't isolate the bug, then there is always the assumption that the bug is in your code.
Laurent Mirguet on March 21, 2008 02:35 AMYou obviously have never worked in an environment like IBM WebSphere Portal or ATG Portal. Our company sells consultants for very high bill rates to work on products like these because they are so convoluted and full of bugs. It takes a special kind of programmer to be patient enough to identify the bugs, report them to IBM (or whatever the company is) along with stack traces, log files, etc. On top of that, you have to find workarounds for the bugs because you can't count on an immediate fix. I had the pleasure of taking a break from .net to do an IBM Portal project last year and I found two nasty bugs in Portal myself. After venting my frustration for a half hour to my coworkers, saying things like "I've never had problems like this with Microsoft technologies", the IBM guys that do this stuff every day told me that this is the norm with IBM products, and it's why they get paid the big bucks. By the way, it took over a month for them to find and fix the bug!
Jeff,
It's more frustrating when you dedicate a complete team resources to look into a complete set of nasty bugs in your application, to discover that the problem is in deep in the programming tools. We discovered that on BDS2006 and RAD 2007 one year ago on BDS2006 and when RAD2007 was launched.
We create the test cases to reproduce the problems, but nor the original company give us a solution nor the new company give us a solution, we have to move back to BCB 6 to have a compiler system that run properly.
Finally we decided to move away of this tools, as fwe do not get proper support, nor way to CodeGear to recognize that a nasty bug is on their tools, that make them unusable in a mixed environment. I still waitting for a solution promised from the CEO but never, ever arrive.
Currently you can not generate proper headers from Delphi components (VCL) to be used on C++ side, result Access Memory Violations across all the components.
http://twitter.com/wilshipley/statuses/774574882
"99.9% of the time, it's your bug, not the compiler. The other 0.1%... well, it's usually your bug then, too."
Jeff Atwood on March 21, 2008 03:42 AMJeff,
I agree, 100% it's your bug, or its in your code or its in the tools you use to develop, but you have to deal with it.
Cryptonome on March 21, 2008 04:04 AMVery true, most of the time... SolidWorks is CAD software use by some of my company's clients. Automation is accomplished by writing code against the SolidWorks API... Based on that experience, I can say that there are times when it's 50/50 the system's fault.
cloggins on March 21, 2008 04:27 AMThis is also the first rule of marriage and many other endeavors.
The discipline of humility is vastly underrated.
Matt on March 21, 2008 04:34 AM"A good carpenter never blames his tools."
Good advice in carpentry, programming, and life.
John Pirie on March 21, 2008 04:35 AMSo we reach step 1, admitting we have a problem and it is, statistically speaking, a bug we've created. Great. Now what?
We go looking for it... but unlike the "select is broken" guy, we run some tests, recreate the problem and isolate and fix the bad code. And for me, if I determine I'm just not seeing the error, even after a good night's rest, I seek a fresh set of eyes and a set of ears to talk through the issue. I don't need to be in a "pair programming" house to take advantage of the technique. Sometimes, just explaining the problem and "talking through" your code with your coworker is enough. Often they don't find the bug, but you do by having to explain it.
Back in the dark ages during my formative CS classes, we had to explain (in comments) the ins/outs, assumptions and pre-/post-conditions of our methods. While at the time I don't think I took that nearly seriously enough, I found that knowing all that stuff about a method definitely minimizes the bugs. If you can't answer what a method does, what it can modify and how, how can you trust it? I'm convinced that almost every real runtime bug I have found is the result of me spending more time typing than thinking.
itsmatt on March 21, 2008 04:41 AMI really prefer the bugs I encounter to be of my own doing. That means I can fix them, and won't have to work around other peoples bugs...
You down with OPB?
Ernst Hot on March 21, 2008 04:46 AMJeff: very true
Sam Farmer on March 21, 2008 05:55 AM>>>
These days, all our code runs on Linux (or maybe OpenSolaris). If select() isn't working right, we open up select.c and check. If your whole stack is open-source, there's no need to program by coincidence.
<<<
The problem with that approach is that this only checks for how select works on that particular machine and not for "how it is supposed to work" (on all other machines). Sometimes there's a huge difference.
To add a story of mine: A couple of months ago I ran into the problem that the application started crashing randomly after switching to a 2.6.x kernel. After hours of debugging it turned out to be a bug in the compiler's run time: phthread_join() was called twice under certain circumstances. No problem with a 2.4.x kernel, it simply spat out "invalid handle" and that got ignored. With the NPTL library on a 2.6.x kernel the call crashes on invalid handles. Of course, looking up the documentation it just says, "the behavior is undefined" under such circumstances. Go figure...
Vinzent Hoefler on March 21, 2008 06:02 AMBetter yet is when the bug is your fault, but you can't repro it on your system. Case in point: a bug was reported in the section of code that I wrote. I ran it, and the "hey, it works on my machine" was the result time and time again. Switch to another machine with a different OS (I'm running one of three Vista machines in the company, everything else is either XP or 2000 Pro), and voila, we arrived at Bug City. As it turns out, there are a few tiny differences in the way Vista and previous versions of Windows handle exceptions. Those tiny differences turned into a pretty big holdup on the project.
Yeah, the bug itself was my fault. But trying to fix it was only hampered by the other 0.1% of the programming pie.
And now that I think of it, I want some pie...
James on March 21, 2008 06:08 AMHeh. Rather timely.
I spent the better part of yesterday trying to figure out why the child process in a CreateProcess wasn't reading from its input handle. The output handles it was writing to fine, but no matter what I did it wouldn't read from the input one.
Below is the chunk of code I found the problem in. See if you can see it too. :-)
// Make the shell's input pipe
if (!CreatePipe(&si.hStdInput,&Shell_Input,&sa,0))
throw std::string("CreatePipe failed because ") + Error_Message();
ZeroMemory(&si,sizeof(STARTUPINFO));
si.cb = sizeof(si);
si.dwFlags = STARTF_USESTDHANDLES;
si.hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE);
si.hStdError = GetStdHandle (STD_ERROR_HANDLE);
Of course originally the ZeroMemory call wasn't in there, but I found that CreateProcess did nasty things without it. Doh!
T.E.D. on March 21, 2008 06:13 AMWhat the hell are you talking about?
My code is perfect!
It's always been perfect and always will be perfect.
My code is a temple of logical perfection!
I was designed by the Kirk, the Creator.
I am perfect.
What?
The Kirk is not the Creator?
I am in error?
NOOOOOOOOOOOOO!!!!!!!!!!!!!!
Alright, you got me. Just do not tell my Pointy-Haired Boss.
Hoffmann on March 21, 2008 06:29 AMI don't like how much you've simplified it. I think it would be better to say something like, "First act as if it is your fault", or, "Begin by assuming the problem is in your code". When you simplify it so much that it's wrong, you've gone too far.
James Justin Harrell on March 21, 2008 06:35 AMEven if the bug is in the system, your code is the only part you have control over, so blaming the system is pointless.
Joel Coehoorn on March 21, 2008 06:43 AMEven more fun are when part of your program works BECAUSE of a bug, that is later on fixed.
Look at this Java code:
protected Class resolveClass(ObjectStreamClass desc) throws IOException, ClassNotFoundException {
Class class = theClassLoader.loadClass(desc.getName());
return class==null ? super.resolveClass(desc) : class;
}
If you can't tell what it does, it tries to load a class with the custom classloader, and in case it can't load it, it instead forwards the class lookup to the next higher class loader. Right ?
Wrong. Trying to load a class will cause a ClassNotFoundException. The code will not even compile if you don't deal with that fact. So the original programmer put the exception in the throw statement. However he did not think it through - if the custom class loader can not load the class, then the first line will throw the exception and thus exit the method. The second line can ONLY EVERY RETURN "class". The forwarding to the superclass can never happen, because it only happens if the custom classloader can not load the class, and if that happens, line two will never get executed.
So how did this piece of code ever work at all ?
Java 1.2 had a bug where custom class loaders were created with all classes already loaded. That's why it never failed to load any class, the exception was never thrown and nothing ever needed to be forwarded to the superclass.
Now imagine what happened when that bug was fixed in later versions of Java. Code that worked perfectly fine broke, and ironically, it really WAS a bug in the compiler (or JVM) - just not the way one expected. Nightmare to track down.
ps.: Here's the fixed code, if anyone cares:
Class class = null;
try {
class = theClassLoader.loadClass(desc.getName());
} catch (ClassNotFoundException cnfe) {
class = super.resolveClass(desc);
}
return class;
It's interesting to read some of these responses; humility doesn't come easily to some, it would seem.
The lesson isn't that there are never bugs in "their" code; of course there are, and anyone who has worked with technology long enough has run into their share of them. The real lesson is a message about statistics, our own fallibility, and our need to validate our accusations before leveling them.
Edward S. Marshall on March 21, 2008 06:54 AMGreat post, Jeff.
On a related note, a mantra that I have to repeat over and over again to fellow developers, and to myself for that matter, is "don't guess - work backwards from the symptom." In other words, determine what conditions could be directly causing the symptom of the problem, then figure out what could be causing that condition, and so on. Seems perfectly obvious, but it's amazing to me how often this gets forgotten.
glaxaco on March 21, 2008 06:57 AMYou should see some of the terribly written tools I have to deal with. Oracle Web Services Manager, for example, elides whitespace before verifying digital signatures, which makes it broken for signed content that contains whitespace and follows the specification. Then if you try to use their extension mechanism to do it yourself you find they pretty print all messages internally which also screws up signed content - working with this tool is like being a pinball bouncing from bug to bug to bug.
a on March 21, 2008 06:57 AMWell, it seems you never used Delphi 2007 . I challenge you to use it even for a single day without finding a bug.
At least most of its bugs don't affect the final product, unlike Oracle, which has those annoying internal errors.
Ricardo on March 21, 2008 06:58 AMI think its unfair to cite a study conducted in 1973 and 1984 as scientific proof that its very often the programmer's fault. 20+ years ago is ancient history for programming, in 1973 the first microcomputer hadn't even been invented yet.
Its true that more people are using system and development software these days, and because of a larger user base the quality may be better. But I think its more important that system and development software is more complicated these days. The average programmer is building on a mountain of pre-existing code, and its hard to believe every line of it is 100% right (in that it functions correctly, and the correct functionality is explained in some sort of document).
I still agree that its almost always the programmers fault, just that a study done before I was even born is hardly proof.
ADINSX on March 21, 2008 07:40 AMI love all the folks that reply to a generalized concept post like yours with anecdotes of how one time (presumably at band camp) it wound being the tools fault, therefore the general assertion doesn't hold true.
Well, I also know of someone's grandpa who smoked, ate and drank excessively, yet made it to 100, but I wouldn't recommend it to everyone.
Bottom line: I've read a lot of forum posts that start with "bug in ..." and 99.9% of the time it's _not_ a bug!
Dennis on March 21, 2008 07:56 AM"So what do you do when you are alone?"
source control (frequently used), unit tests & binary search are your friend. When all else fails, go back and find the change that introduced the bug.
Rick on March 21, 2008 08:00 AMI'm finding the level of defensiveness to this post not only humorous, but indicative of how true Jeff's ultimate point is.
At any given moment, you're either part of the problem or part of the solution. Pointing fingers and dishing blame is being part of the problem. Taking immediate ownership, regardless of whose fault it really is, instantly takes you a step toward the solution.
Great post, Jeff.
Aston on March 21, 2008 08:05 AMI think it was Arnold Glasnow who first said "A good leader takes more than their fair share of the blame and gives more than their share of the credit."
Rick Conklin on March 21, 2008 08:25 AMThere's a bit of madness down both paths. We recently had a bug that was related to Firefox and Vista. We spent four maddening days scraping our mental fingernails against the black box that is the Aero layer. Our bug was finally fixed... by Microsoft releasing Vista SP1. I gained a lot of knowledge along the way during our bug-hunt, but I still don't know exactly why it was fixed. And since we're happy it's fixed at all, I no longer have the resources to chase the bug without pulling overtime. C'est la vie.
That having been said, it's still worth chasing such bugs. At the end of the day, I only have control over one code base: mine. I can fix bugs by changing what I control, or I can hope that forces outside my sphere of influence will do it for me. My customers appreciate it if I do the former.
Mark Tomczak on March 21, 2008 08:25 AMi wholeheartedly agree with this post as i always assume that a problem in the system is a problem with my code. but, i recently went through a very draining process of rooting out a bug when using linked servers in sql server 2005 64 bit to talk to oracle 10g. after a 3 week process of rewriting code, testing and performing research, turned out that there was a bug in sql server. environmental bugs do occur, but that is the exception.
JRock on March 21, 2008 08:28 AMI love your blog, it's clear your dedicated to the craft, and that is what I have always aimed for.
To constantly learn, and hone your skills, to work with other's to create applications, that you can be proud of, let go.
But it takes constant focus, to make sure we release bug-free, extremely clear cut applications, that do what the people who requested them, asked us to do...
Sadly most of my experience, has been with companies, that cared more about quantity and speed, than quality, which usually ended up biting them in the rear.
Let's keep at it, and never give up on quality.
Craig M. Rosenblum on March 21, 2008 08:33 AMReading Jeff's blogs makes me smile (heck, even the last one on the rechargeable batteries got me a little excited). It takes some courage to admit one's faults, but I think it shows strength of character. I'd always prefer a programmer who was less skilled but honest over one who was more skilled but failed to take responsibility, lied about or vehemently denied the sources of bugs, claimed things were tested thoroughly etc.
Rich Bateman on March 21, 2008 08:45 AMThe second rule of programming is YOU DON'T TALK ABOUT PROGRAMMING.
Gabe on March 21, 2008 09:05 AMThis reminds me of two rules of software:
1. Statistically speaking, a bug is more likely to be in code higher on the stack than lower.
2. The problem is between the chair and the screen.
One of the joys of software development is what we live in a very predictable world, one where once you understand what the code is doing, everything makes perfect sense: there really are no "ifs" or "maybes". You can spend days debugging a problem, convinced that the world is indeed flat, or that gravity is no longer in force, only to find the bug, which when fixed, makes you feel that harmony has once more been restored to the world. And it is this what makes you feel like a software superhero.
Simon on March 21, 2008 09:06 AMGreat post!
Steven Klassen on March 21, 2008 09:07 AMThis post reminds me of one of my favourite quotations:
"To err is human--and to blame it on a computer is even more so."
Robert Orben
I love these kinds of posts, if only to remind myself that I can do better. I also love reading the comments, like "this one time it didn't work" etc. My experience has been (much like the select guy), that it doesn't work the way you think it should because it wasn't designed for that. So much of programming is internalizing bizarre concepts invented elsewhere, and when we fail to drink the koolaid we assume it's the other guys fault. Just like a carpenter, when the saw isn't cutting right should we either have to sharpen the saw (learn more) or try something else. Too many times we blame the wood and the saw.
SteveJ on March 21, 2008 09:30 AMHaving worked on, among other things, support, I am always amazed when a caller's theory about why his code doesn't work is b) there's a bug in our software, c) there's a bug in the operating system, d) there's a problem with the hardware. Notice the conspicuous lack of (a).
Then again, sometimes it is (b). Sometimes. Very sometimes.
mike on March 21, 2008 09:30 AMDepends on the each situation. As a programmer, I think each time your come across a bug, investigation is should be done before giving any answer. Sometimes it is as easy putting a breakpoint in the suspected section of code and figuring out the problem.
Other times, much harder. Consider a case with many 3rd party components are running and a memory exception happens and all you have is a memory address at the point of crash. This case requires crash dumps and using some low level tools to figure out which exe or dll the memory address resides in. Other posters have pointed out similiar examples of these type of scenarios.
I don't think it is always your fault as a programmer, especially in today's age of large and numerous frameworks.
I think it is better to say: We as developers should always assume responsibility to find a solution to the problem/bug. And then if it is your fault saying, "I made a mistake, its my fault." Don't assume anything, test, debug, and explore to find the root of the problem.
Everyone makes mistakes, that's why there should be lots of QA on your product/code.
Jon Raynor on March 21, 2008 09:30 AMHaving worked on, among other things, support, I am always amazed when a caller's theory about why his code doesn't work is b) there's a bug in our software, c) there's a bug in the operating system, d) there's a problem with the hardware. Notice the conspicuous lack of (a).
Then again, sometimes it is (b). Sometimes. Very sometimes.
mike on March 21, 2008 09:32 AMTrue. Unless you're coding in Ruby. In which case it's a bug in Ruby :)
DaveG on March 21, 2008 09:39 AMOn one project where we were dealing with some bugs I suggested we tell the customer that "we didn't see xyz issue coming and need some time to rearchitect things to handle it in a graceful manner." My manager wouldn't hear anything of it. This was a fairly major issue, not something like a spelling error in a dialog box. It was swept under the rug and glossed over.
I see this as a manifestation of the ego-centric culture of engineering: "It can't POSSIBLY be my fault, for when you find a flaw in what I created, it is like you found a flaw in ME!" All the bravado and machoness is usually a cover for insecurity.
Matt Green on March 21, 2008 10:22 AMGood post. I agree that you should always point the finger at your code first. Only after you have empirically proven through exhaustive tests that it's not your code should you go on to look at the OS, outside libraries, etc.
Here's a fun tale for you Windows ecosystem devs:
Short version: Don't pass a null reference to a COM object on a Win2000 machine!!
This was incredibly hard to track down because my development machine is running XP, which has a newer version of COM than Win2000 machines do, so this bug only showed up on the production server. XP's version of COM quietly handles this and goes along its merry way, but 2000 barfs all over the place.
Long time listener, first time caller on March 21, 2008 10:34 AMJeff, I love your blog and I agree with many of your points. The only thing I have to say about this one is that you have to be careful in accepting responsibility as well. I personally always accept responsibility when my code fails. Some times though to the point where I don't feel like anyone else should have to fix it except me. One of the pitfalls of accepting responsibility of your own bugs is madness just the same as blaming the OS or the tools library, etc. Like Gordon Ramsey points out to chef's on the British version of Ramsey's Kitchen Nightmares, the more work you put in the more it hurts when it doesn't turn out right or someone doesn't like it. Good programmers are much the same, the more effort put into a project, the more it hurts when bugs are found or worse when people don't like the finished product (not one person mind you but the majority, you can't always please everyone). You have to manage the acceptance of responsibility with a little bit of detachment from your own code. Once the bug or problem is found it's good to say 'yeah it's my fault, I'll get to the bottom of it and fix it.' but it's also good to approach getting to the bottom of it and fixing it like it wasn't your code. Part of the problem in both cases is thinking you did something right and not bothering to look at it again, but if it wasn't your code, you may look there. When I started out in the field I had to deal with this a lot. I would write code I thought pristine and when a bug was found I'd jump in to fix it. Problem was that I never looked at the part I thought pristine, just the parts that I thought didn't look as good. In almost every case it was a minor insignificant bit in the code I wasn't looking at. All because I wasn't looking at it objectively. Just a little food for thought.
Arcond on March 21, 2008 10:51 AMWell, almost true. I did once have the pleasure of finding and documenting a bona-fide driver bug in nvidia's display drivers. Of course finding that out is only half the solution, you still need to come up with a workaround.
Eamo on March 21, 2008 11:01 AMThis reminds me of the Happy Days episode where Fonzie had a really hard time saying, "I'm wwwrrrroooonnnnggggg."
Mark on March 21, 2008 11:13 AMWhen I was learning how to program, it felt like a loss of innocence the day I realized the compiler was a program just like any other, and subject to bugs and quirks like any other program. That's just been reinforced by a few significant compiler bugs since then (crashing the VC6 compiler using C++ templates being a favorite hair-pulling-out moment).
I agree that as a rule of thumb, bugs almost always ultimately turn out to be a problem with your code. However, even in those cases there are a number of ways third party software can share in the blame:
* Missing, incomplete or misleading documentation, so the semantics and assumptions of the third party software are ambiguous.
* Missing, incomplete or misleading error reporting, so when you use the third party software incorrectly, it fails silently, or reports something useless like "Generic Error -1: An error occurred."
* Not having the source code to the third party software, making it much more difficult to diagnose the interaction between your code and the library. Even if you accept that your code is probably wrong, because of (1) and (2) it may still be nearly impossible to debug without stepping into the third party code to figure out what it is expecting to happen. It is amazing what five minutes with the debugger can discover compared to hours of poking at a black box.
* Having the rug pulled out from under your feet. You wrote some code, it worked perfectly fine, and a library/OS upgrade changes some subtle aspect of the behavior (or has a bug fixed itself that maybe you had worked around) that causes some previously flawless code to fall over.
One of my formet bosses summarized it well :
"If it ain't on fire, it's a software problem"
Bob Norvell
I don't see any defensiveness, just an amusing list of times when it WASN'T the developer. I can recall quite a few cases I have found. (A vsprintf bug in MSCRT.DLL is my favorite. I ran this down to the one line of assembly that was in error.) The fact that 99% of the bugs are mine or another developer on the team is a good thing. Tracking down a system bug is tedious, time-consuming, and just plain hard. Trying to come up with a workaround is just about as bad. Then you should keep track of the workaround because this is dependent behavior, the bug could get fixed and break the workaround with no warning. An example of this is a bug in the Oracle 7.3 libs. The bug was known and documented, we had a workaround. The bug was fixed in a version 9i build, which broke the workaround. It was pretty obvious when it happened because I knew about the workaround, but it could have slipped by someone else who didn't know.
Tim on March 21, 2008 12:04 PMI had to test a guy's code... let's call him "Frank"... and interestingly, every time I came to him with a defect found while inspecting his code, he'd consistently try to direct me elsewhere. Once, he even advised me that *I* was the defect. Trust me, he was the only developer I ever ran across that was so sure he was not the cause of a defect that he stuck in my memory.
When I read the article today, I was reminded of Frank. I didn't have respect for him as a developer only because he never took on responsibility for his portion of the overall objective. The developers I have the greatest respect for are those who will open up their code and step through it while a peer sits with them in amazement. Those developers do take responsibility. When it's not their code, or piece of an integral system, they will identify what the real issue is - and smile. Frank rarely smiled.
Kwan on March 21, 2008 12:56 PMJust the other day a programmer where I work was faced with a process that was dying with a SIGABRT. He looked at the stack trace and found the signal was being generated on a line that contained an assert(). Somehow he managed to convince himself that the compiler was generating the wrong code and getting the assertion test backwards!!! I arrived at the office that morning and found in my inbox a mail from late the night before with a detailed disassembly of the code in question, along with references to the Intel IA-32 specification showing the compiler's error.
I opened the code, and put a 'print' statement in right before the assert(). What do you know, the value being asserted was 0.
Anonymous on March 21, 2008 01:13 PMThis is usually but not always true.
When, watching your code run in the development environment, you see the system tell you that boolean literal True has a value of False, you really are seeing a bug in the tools, and it's time to restart the machine, because something has gone deeply wrong in a core library somewhere, and just restarting the environment won't suffice.
Ask me how I know this.
(Now, the other 999 out of 1000 times, the bug's my fault.
But not this one.)
Sigivald on March 21, 2008 01:33 PMIn almost 30 years of software development, the number of bugs I've found attributable to compiler/tool bugs is probably something like five. So I've learned this lesson.
But that's not to say weird things don't happen. Many years ago, I was debugging a complex computation in a Lisp program and just going nuts, the behavior made absolutely no sense. Then, in frustration, I typed "2.0 + 2.0" to the command prompt...and got back something like "256002.0" as an answer. The system's floating point extension card had failed in a way that returned wrong answers without raising exceptions.
Has anyone run into a situation when yuo have this in PHP:
try {
throw new Exception('foo', -1);
} catch (Exception $e) {
echo 'caught!'; exit;
}
YOU RUN IT, AND IT DOESN'T CATCH THE EXCEPTION! Instead, the spl_autoload handler I registered catches it and prints out your "foo" extension.
Clearly it had something to do with how the language was being executed. So I changed the OPCODE CACHE to something else (XCache nistaed of eAccellerator) and it worked.
But now the question -- WHY is this happening? Am I just going to have to tell people who use my stuiff to NOT use eAccelerator? There must be something in my code taht I did that creates this bug in eAccelerator. But what?
The worst kinds of bugs are the ones you can't track down...
Gregory Magarshak on March 21, 2008 02:04 PMSome of the toughest bugs i find today are with Javascript. No matter how experienced you are, you're always elligible to find a Javascript bug that just doesn't make any sense, and it is usually your own fault. The closest i got to "it not being my fault" was in doing some recursive javascript functions where the values of some of the variables in the function were completely senseless and impossible to trace. After two days of looking at the code i realized that somehow the variables declared inside the recursive function, even those inside statements inside the recursive function had global scope and were being reused across the recursive calls.
Lesson learned: always use the keyword "var" for variables in javascript.
I still blame Javascript, it wasn't my fault. :P
Bobby on March 21, 2008 03:18 PMWord. Unless you happen to be writing code in RoR. In that case, you can rest assured it's not your fault!
Josh Stodola on March 21, 2008 03:56 PMLeo: "All too often the problem is that the bug is our own fault and is due to our misunderstanding of how a component works. In those cases it's easy to see why we assume the bug is somewhere else: We look through our code over and over and everything looks perfect, but only because we have an incorrect view of how it should be."
In that case, it sounds like the other programmer wrote incomplete or misleading documentation -- a common rookie mistake. I know the slashdot mentality is "we don't need no steenkin' documentation!", but the truth is that all of the best programmers I've ever known were also the best at documenting their code.
I maintain that incomplete or misleading (to a person) documentation is a bug of no less severity than incomplete or misleading (to a machine) source code.
Who's the god of computer science? Knuth -- advocate of Literate Programming, not coincidentally. You don't see him sending out "oh, it doesn't work in that case, but you just misunderstood what I'm saying..." letters. When it doesn't work like he claimed, he sends you *money*. That's why he's Knuth!
ken on March 21, 2008 06:52 PMI once took over responsiblity for a system built and maintained by two fanatically, rabit, Microsoft hating VB 6 developers (hey I guess they had to pay the bills somehow) 8-).
For a couple of years before I was brought in, they had conditioned the business owner of the application that almost every bug was a bug in Windows 2000 (which was part of the reason I was brought in to er...well..babysit). There was always a 'workaround' that would take 2 or 3 days to meticulously craft to 'FIX' the crappy Microsoft OS...at least as was conveyed to me by our business partner (I found out later that everything took 2 or 3 days because of their addiction Everquest).
Within hours of my first day working with my 2 new best friends, our business partner comes in, after getting his ear chewed off by an angry customer, and explains the problem and literally the first words that are spoken by one of them was "F'ing Windows 2000...it's such a POS" without even trying to reproduce or debug it.
My suggesting that we should do the work necessary to prove that it wasn't our code first before casting aspersions at the OS was met with...well...let's just say, they looked at me like I should be wearing a dunce cap and standing in the corner with my Microsoft fanboy-self.
Of course it was a bug in our code, arguably an odd, edge case, probably never, ever, happened before bug in our code. The next 30 bugs that were reported over the course of the next 2 months (it was really really buggy code) were NOT Microsoft related either.
Lucky for me, one of the two of them quit (because he could not longer tollerate my draconian ways...like implementing source control and bug tracking) and I was eventually only forced to let one of them go (the daily 6 hour Everquest exxersions only made it easier).
Do this day whenever I hear a developer quickly conclude that it was an Microsoft OS issue I have to really really have to fight the urge to run screaming from the room.
Rob on March 21, 2008 07:29 PMI've found that the best way to deal with weird obscure, unfiguroutable (yes it's a word now.) bugs is to get up and walk away.
99% of the time, if I get up, walk away, get a drink, go grab a snack or something, and come back, I find the bug I was dealing with almost instantly.
dnm on March 21, 2008 07:49 PMWe had some code at work that broke over the course of a few releases of Qt 4. Some of it is still broken.
Google is your friend. After two hours of GUI voodoo incantations, we started to do more targeted, judicious Googling and found a fix in fifteen minutes.
The Qt documentation is not very nuanced and it's changing fast. We're staring at another framework release in June, I think. The path between release and proven rock-solid stability is always hacked with a machete.
Ron on March 21, 2008 09:21 PMWell, yeah, you should assume it is always your bug...but not to the extent that you miss system bugs. All the nastiest bugs I remember from my career were system bugs. They were nasty precisely because everyone assumed the system was bug-free.
A couple months ago, we spent a few days of confusion on a proprietary platform until we finally pared it down to this code, which caused a crash:
try {
throw 5;
}
catch(int)
{
}
I get your point, but there really does come a time when you've found a bug in someone else's stuff. It does happen.
OK, let's say you're coding something that's going "onto the metal" for some embedded device, and so, to ease testing, you find an emulator. This is a fine emulator, and everything's good until one day it all falls apart. You're doing some testing, and something really unexpected pops up: the emulator is expecting some set of registers that you've not set up.
"OK," you think, "I did something wrong," so you check the manuals for the machine you're aiming it at. All seems good: the manuals mention requiring the entity in question, but only on a totally different code path from what you're following. So you check it out properly: burn it onto the ROM and fire off a proper test. And it works perfectly.
So you go to the makers of the emulator, proffer a patch, telling them you'd tested it on the hardware, and they come back to you saying that they're not going to put the patch in because [insert excuse here].
Is it STILL your fault?
Bob on March 22, 2008 01:29 PMThis used to be more true in the 70's, the 80's and early 90's. My experience tells me it's not as true today.
Bill on March 22, 2008 10:55 PMI have learnt this the hard way but the problem is it makes you very reluctant to accept when you do find a bug in a system library.
I spent a whole day trying to prove I HADNT found a bug in dotnets XmlSerializer, but I had.
I dont think having a 'its my fault mind-set' is good advice, instead logically removing causes 1 by 1 seems more a more sensible mantra for bug fixing.
Tom Deloford on March 23, 2008 08:54 AMI recall one time where I claimed the problem was in a system component after analysis. In particular, System.IO.MemoryStream.GetBuffer() was returning too long of an array. My response was to write a drop-in replacement for the system component. Problem gone. Whether the system component was actually at fault or it was just not doing what we expected I'll never know (vague documentation...), but the problem was fixed.
Joshua on March 23, 2008 02:16 PMwell... if you want to put it into perspective, ultimately everything is our fault. or maybe a better frase is " its your job "
like... why did we use this piece of code?? why didnt we use another piece of code??
heck.. why did we even use this language in the first place? why didnt we use anohter language??
it was our choice, thats why its our fault
its like when morpheus offered neo to take the blue pill or the red pill
yeah... we all took the blue pill and followed the rabbit through the hole
ultimately it is our fault... "if only i hadnt drinked that much beer before morpheus asked me that stupid question".
it is my fault... isnt it???
to fault or not to fault (oot)
i think there are going to be some pretty depressed people when their working their mindset is always "its always my fault". maybe in the future, maybe, just maybe, proggrammers will be in the top 5 most deadliest jobs under alaskan king crab fisher and miners because of the high suicidal death rate of depression (maybe a tad to much exagerration :p but im not trying to be funny).
coding is not for the faint hearted, its tough, when your confronting thousands/millions of line of code, hundreds of bugs and a tight deadline. you gotta be pretty special to rise above the challenge and conquer all. I agree with the saying "If you want to be great, you’re responsible for making yourself great".
i pray for the day to come when programmers can say "ITS NOT MY FAULT"
Most bugs I come across these days are compiler bugs... but then again, I work on that compiler =)
Erika on March 23, 2008 08:48 PMI've slightly different experience-most of the time that was framework's fault.And it seems reasonable to me,most of fremeworks/apps released this days has "beta" tag attached to them or assume it.With ability to update software by internet,you don't feel much responsibility for product you release-release erlier,release often principle in the wild;-)So,despite of TDD popularization,quality of software and frameworks(as more complex beasts) taken in particular moment of time,generally going downhill IMHO.But software evolving faster,fixing old bugs and intruducing new in equal proportion.More complex software is more susceptible to bugs,so it's quite reasonable to assume bugs in frameworks this days.
Thanks.
Andrey Skvortsov on March 23, 2008 11:24 PMI've slightly different experience-most of the time that was framework's fault.And it seems reasonable to me,most of fremeworks/apps released this days has "beta" tag attached to them or assume it.With ability to update software by internet,you don't feel much responsibility for product you release-release erlier,release often principle in the wild;-)So,despite of TDD popularization,quality of software and frameworks(as more complex beasts) taken in particular moment of time,generally going downhill IMHO.But software evolving faster,fixing old bugs and intruducing new in equal proportion.More complex software is more susceptible to bugs,so it's quite reasonable to assume bugs in frameworks this days.
Thanks.
Andrey Skvortsov on March 23, 2008 11:42 PMThis reminds me of this entry, which deals with the same topic.
http://whatiseeinit.blogspot.com/2008/02/bad-worker.html
I always enjoy reading your entries. =)
Abdication of responsibility.
You see it more and more, especially with this 'new' generation.
It's like your first example; the programmer could not believe that it was his code causing the select problem because he could not believe that HE was responsible for the problem.
Remember, always take responsibility for what you have done, both the good and the bad.
If you are wrong admit it and fix it. You earn far more respect with that approach than any other.
Oh yea, by the way, most of the time even if it is the OS, framework, compiler, the blond in the cube down the isle fault; there is usually a work around to get around the problem. So, even if it is the above mentioned fault then it's still YOUR FAULT for not finding the work around.
TAKE RESPONSIBILITY, in the long run you will be glad you did.
By the way Jeff, how is your venture going? Well I hope...
Regards,
John McPherson, Senior Software Developer
To those who say Jeff is saying the problem is always in your code:
"Start with your code, and investigate further and further outward until you have definitive evidence of where the problem lies."
One only needs to read.
To those who imply the article is wrong, because one time the bug wasn't your code:
The plural of anecdote is not data.
Apparently you have never used any of borland's tools. Access violations run ramped, restarts are required. Hell, one time it wasn't calling the deconstructors for statically instantiated class variables. Yes, most of the time it is the programmers fault, sometimes it really isn't.
Not Quite on March 24, 2008 01:10 PMWhenever I get stuck on a really annoying bug, something that seems to be really obvious but I just can't see, I just ask someone else to take a look.
I take a few minutes to explain the problem, and the bug. Usually, just explaining it to someone else is enough for me to 'see it'. If it is not enough, they usually find it. If they don't find it, we usually find the problem together.
lorg on March 24, 2008 02:16 PMscenario: The user saves data, exits the program, and shuts off the computer. On next use, the data file is messed up.
Question: Is it a defect not to use FILE_FLAG_NO_BUFFERING, FILE_FLAG_WRITE_THROUGH, nor to invoke FlushFileBuffers before closing the files?
DAKra on March 24, 2008 03:57 PMThere are exceptions however COUGH**SharePoint**COUGH
Eugene Katz on March 24, 2008 07:22 PMWhat if you're cleaning up other people's code and you found out that that they code worse than you...then after the clean-up, the enhanced code (with your fingerprint) is failing while the "junky" original code you started with works?
Talk about hang-ups.
wilhelmina on March 24, 2008 10:37 PMIt depends...
Once upon a time, we have a website that don't work or Firefox 1.0. Basically all buttons have no response when be clicked.
It was later revealed that before Firefox v1.0.0.3, they forgotten to add the click handler for image buttons ("input" tags with type="image") so the click events won't fire.
It's almost always better to check for known defects after the first round of search for bugs but before starting the next round. In this way you can make sure "it's your fault for this bug" and no excuse.
I'm agreeing with the general consensus. Use the following algorithm:
while ((developer.desparationLevel() < MADNESS) &&
(! problem.solved()))
{
if (developer.getYearsExperience() <= 5) {
debugger.use();
} else {
tweakFrameWork();
}
}
Seriously, todays platforms and frameworks have so many moving parts with potential incompatibilities that you can't take anything for granted. Although I still blame myself first... habit.
Dave on March 25, 2008 11:17 AMAt my first full time programming job, I was writing C code using a library written by another developer. Being a rookie, I made silly off by one errors, blew up the stack and screwed up memory. Sometimes it would crash in my code, but sometimes the library would crash. For example, I called a routine that required a preallocated memory block that wasn't allocate to the size I told the library it was. Crash.
I can still remember the argument now.
"Your library's got a bug".
"No, your code's got a bug".
"But it crashes here".
"Yes, because you've passed it garbage here and here and this handle is uninitialized."
"Oh, yeah, let me fix that ... oh it works now - thanks!"
Since then I've assumed all bugs are my fault, even when they're not.
Now maybe the library should have been more defensive and not simply crashed. But, maybe it was actually the right thing to do. It forced my client code to call it in the way it was designed, rather than just silently failing and involving much hair tearing out.
I get the same thing when I mess up with STL - you get thrown into a line of source code that looks uncanningly like line noise. At that point, I'm _praying_ that the bug's in my code, cause I've got more of a chance of reading that!
Ritchie Swann on March 27, 2008 03:45 AMForgive me, but I disagree somewhat with this idea that "a good carpenter doesn't blame his tools". A good carpenter recognizes his mistakes it is true, but he also recognizes how his tools allowed him to make those mistakes and seeks to improve the tools. Perhaps the straightedge is off, or the nails weak, or the wood warped. Tools and materials need to be resilient to cope with any reasonable carpenter.
You can't create quality products when the tools are working against you.
Sometimes the problem is indeed in the system rather than your own code. Back in the 80s I was translating some astronomical utilities into S-Basic to run on a Kaypro 10. One of the programs used the greatest integer function int(x) extensively, and when I ran some test cases the program would sometimes give right answers and sometimes give wrong answers with no apparent rhyme or reason. I couldn't find anything wrong with the implementation of the formulas, so I inserted a number of print statements to print out intermediate results as the program ran. I finally determined that the int(x) function as implemented on the system would return wrong results if the floating-point number x was exactly a negative integer, e.g., if x was -3.0000..., int(x) would return -4 instead of the correct answer of -3. So I fixed it with the kludge of writing a function to use in place of int(x) which tested the argument to see if it was a negative integer, and if so added "1" to the result of calling the system function, otherwise returned the result of the system function . . .
Ronn! Blankenship on March 29, 2008 09:30 PMGreat article Jeff.
Matthew Wills on March 31, 2008 02:35 AMDoes it really matter that much? Programmers can just swallow a little pride and say, "Hey it's my fualt." Yeah I know that you might have spent HOURS AND DAYS on that stupid code but still, its just courtesy to do so.
Me on April 10, 2008 12:58 PMNow nicely illustrated at http://stuffthathappens.com/blog/2008/04/11/what-its-like-to-be-a-programmer
lession 1.
don't outsource your code to developing countries!
lession 2.
cheaper code cost more down the road.
lession 3.
write in the language of the code, i.e. ENGLISH
| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |