March 20, 2008
You know the feeling. It's happened to all of us at some point: you've pored over the code a dozen times and still can't find a problem with it. But there's some bug or error you can't seem to get rid of. There just has to be something wrong with the machine you're coding on, with the operating system you're running under, with the tools and libraries you're using. There just has to be!
No matter how desperate you get, don't choose that path. Down that path lies voodoo computing and programming by coincidence. In short, madness.
It's frustrating to repeatedly bang your head against difficult, obscure bugs, but don't let desperation lead you astray. An essential part of being a humble programmer is realizing that whenever there's a problem with the code you've written, it's always your fault. This is aptly summarized in The Pragmatic Programmer as "Select Isn't Broken":
In most projects, the code you are debugging may be a mixture of application code written by you and others on your project team, third-party products (database, connectivity, graphical libraries, specialized communications or
algorithms, and so on) and the platform environment (operating system, system libraries, and compilers).
It is possible that a bug exists in the OS, the compiler, or a third-party product-- but this should not be your first thought. It is much more likely that the bug exists in the application code under development. It is generally more profitable to assume that the application code is incorrectly calling into a library than to assume that the library itself is broken. Even if the problem does lie with a third party, you'll still have to eliminate your code before submitting the
We worked on a project where a senior engineer was convinced that the select system call was broken on Solaris. No amount of persuasion or logic could change his mind (the fact that every other networking application on the box worked fine was irrelevant). He spent weeks writing workarounds, which, for some odd reason, didn't seem to fix the problem. When finally forced to sit down and read the documentation on select, he discovered the problem and corrected it in a matter of minutes. We now use the phrase "select is broken" as a gentle reminder whenever one of us starts blaming the system for a fault that is likely to be our own.
The flip side of code ownership is code responsibility. No matter what the problem is with your software-- maybe it's not even your code in the first place-- always assume the problem is in your code and act accordingly. If you're going to subject the world to your software, take full responsibility for its failures. Even if, technically speaking, you don't have to. That's how you earn respect and credibility. You certainly don't earn respect or credibility by endlessly pawning off errors and problems on other people, other companies, other sources.
Statistically, you understand, it is incredibly rare for any bugs or errors in your software not to be your fault. In Code Complete, Steve McConnell cited two studies that proved it:
A pair of studies performed [in 1973 and 1984] found that, of total errors reported, roughly 95% are caused by programmers, 2% by systems software (the compiler and the operating system), 2% by some other software, and 1% by the hardware. Systems software and development tools are used by many more people today than they were in the 1970s and 1980s, and so my best guess is that, today, an even higher percentage of errors are the programmers' fault.
Whatever the problem with your software is, take ownership. Start with your code, and investigate further and further outward until you have definitive evidence of where the problem lies. If the problem lies in some other bit of code that you don't control, you'll not only have learned essential troubleshooting and diagnostic skills, you'll also have an audit trail of evidence to back up your claims, too. This is certainly a lot more work than shrugging your shoulders and pointing your finger at the OS, the tools, or the framework-- but it also engenders a sense of trust and respect you're unlikely to achieve through fingerpointing and evasion.
If you truly aspire to being a humble programmer, you should have no qualms about saying "hey, this is my fault-- and I'll get to the bottom of it."
Posted by Jeff Atwood
I'm finding the level of defensiveness to this post not only humorous, but indicative of how true Jeff's ultimate point is.
At any given moment, you're either part of the problem or part of the solution. Pointing fingers and dishing blame is being part of the problem. Taking immediate ownership, regardless of whose fault it really is, instantly takes you a step toward the solution.
Great post, Jeff.
This reminds me of two rules of software:
1. Statistically speaking, a bug is more likely to be in code higher on the stack than lower.
2. The problem is between the chair and the screen.
One of the joys of software development is what we live in a very predictable world, one where once you understand what the code is doing, everything makes perfect sense: there really are no "ifs" or "maybes". You can spend days debugging a problem, convinced that the world is indeed flat, or that gravity is no longer in force, only to find the bug, which when fixed, makes you feel that harmony has once more been restored to the world. And it is this what makes you feel like a software superhero.
Depends on the each situation. As a programmer, I think each time your come across a bug, investigation is should be done before giving any answer. Sometimes it is as easy putting a breakpoint in the suspected section of code and figuring out the problem.
Other times, much harder. Consider a case with many 3rd party components are running and a memory exception happens and all you have is a memory address at the point of crash. This case requires crash dumps and using some low level tools to figure out which exe or dll the memory address resides in. Other posters have pointed out similiar examples of these type of scenarios.
I don't think it is always your fault as a programmer, especially in today's age of large and numerous frameworks.
I think it is better to say: We as developers should always assume responsibility to find a solution to the problem/bug. And then if it is your fault saying, "I made a mistake, its my fault." Don't assume anything, test, debug, and explore to find the root of the problem.
Everyone makes mistakes, that's why there should be lots of QA on your product/code.
One of my formet bosses summarized it well :
"If it ain't on fire, it's a software problem"
We had some code at work that broke over the course of a few releases of Qt 4. Some of it is still broken.
Google is your friend. After two hours of GUI voodoo incantations, we started to do more targeted, judicious Googling and found a fix in fifteen minutes.
The Qt documentation is not very nuanced and it's changing fast. We're staring at another framework release in June, I think. The path between release and proven rock-solid stability is always hacked with a machete.
Yep, it's 99% your fault. But for a professional full-time programmer, fumbling with 2-3 bugs per week, that means you'll find no less than 1 system (or library, or anything else) bug per year.
Experience here is very important, because it helps you to track down whose errors might be your fault and what cannot possibly be in less time. The ability to determine that and 1) fix your error or 2) produce a good work-around when you can't fix someone/thing else's error is very precious.
Last year we had network application using a foreign library connecting to a remote server. Multiple-instancing the library (loading two dlls using it) on a single application caused Win2003 server network layer to randomly halt. It was relatively fast for me to determine that the fault was either in the foreign software or in in a O/S bug on which the developers of that library fumbled upon: the only network related difference with already working production servers was in that library, and the bug happened relatively fast (in a few minutes since the app started). Also, it didn't happen on other O/Ses.
The hard part was convincing my boss, a M$ evangelist. Eventually I managed to get him on the server console and had him try to ping our network back. The machine froze under his hands...
I will always remember him staring at the monitor end then at me, in an enlightened bewilderment.
What if you code in machine code ;) pentium bug, does that count?
Blaming myself actually costed me a lot of time (I do it naturally, never had to learn or think about it). For example asking myself what in the world I could possibly be doing wrong (php echo's which just weren't working and such), and then learning about 15 minutes later that the FTP was failing to update the file because the classmate still doesn't know how to set FTP permissions (he chmodded one includes folder to 777 for me, leaving still no access to any other folders -.-).
From the other side, I can imagine that it also could have costed me a lot of frustration and time if I blamed the software I was relying on right away. Now I've experienced a couple times that it was not my fault (while often it was my fault :p), I think I've found a pretty good balance. The underlying system is usually not thát hard to check, so after about 10 minutes of troubleshooting (10/15m is usually enough time to go through most of the possibilities) I start checking the system.