I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


17 posts from June 2004

June 30, 2004

Edit and Continue

I'm looking forward to VS.NET 2005 like everyone else, but the one killer feature that will absolutely compel me to upgrade on day of release is Edit and Continue. I had no idea exactly how much time I spent editing live code in VB6's debugger until I lost this capability in VS.NET. It is my one serious regret about .NET-- which in every other respect is a massive improvement over VB6. I am sympathetic to the timeline crunch that forced Microsoft to drop the feature, however:

PM: Given that features like edit-and-continue didn't make it in, do you feel that any of the emphasis on your initial version of VB.NET was misplaced?

CD: No. It's easy to second-guess things in hindsight or even as you're going along: "Are we doing the right thing? Should we do this? Should we do that?"

But when I review the decisions we've made to move the tool to .NET, I still think it was the right move to cut edit-and-continue from the first release, given that the primary goal was to ensure we got on the platform. Because going forward, the developers who use this tool will reap tremendous benefits from being on the .NET Framework. All the stuff that comes from being on Longhorn, all the managed code, we're there, and we don't have to wait for anything.

Today, we have full access to the platform, and the bleeding edge, hard-core developers can now write XAML, Avalon, and Longhorn apps with VB Whidbey's managed code. Now, we can go back and figure out what we can do to make this tool easier and more productive to use for everybody, which is a nice place for us to be.

I agree, and VS.NET 2005 is right around the corner. What I don't understand, though, is developers who think Edit and Continue is a Bad Thing. That is one of the most wrongheaded things I've ever read, and I have to assume it's spoken from ignorance, e.g., developers who have never had this capability in their toolset and therefore don't know what they are missing.

All edit and continue does is tighten the loop between the time a bug is detected, and the time you can fix it. How can this possibly be a bad thing? On the contrary, it is a huge boost to productivity. At the time of the exception, you can diagnose the problem-- in perfect context of all the live code, which is the easiest way to determine what the fix should be-- and make your fix. Then just keep on truckin'.

Compare with the alternative:

  1. Hit an exception
  2. Slap yourself on the forehead for being a moron
  3. Diagnose the problem at exception time and determine a fix
  4. Wait for the IDE to shut down
  5. Navigate to the right place in the source code
  6. Try to remember what the heck the fix you came up with actually was
  7. Enter the fix
  8. Compile the code
  9. Run the app and exercise the fix
I burn far too much time in VS.NET 2003 doing this. Edit and Continue would cut the work I have to do to fix a bug in half. IN HALF! But then, the close relationship between immediacy of debugging and productivity isn't a new concept. Fred Brooks talks about it in his 1975 golden oldie, The Mythical Man-Month:
One of the justifications for MIT's Multics project was its usefulness for building programming Systems. Multics (and following it, IBM's TSS) differs in concept from other interactive computing systems in exactly those respects necessary for systems programming: many levels of sharing and protection for data and programs, extensive library management, and facilities for cooperative work among terminal users. I am convinced that interactive systems will never displace batch systems for many applications But I think the Multics team has made its most convincing case in the system programming application.

There is not yet much evidence available on the true fruitfulness of such apparently powerful tools. There is a widespread recognition that debugging is the hard and slow part of system programming, and slow turnaround is the bane of debugging. So the logic of interactive programming seems inexorable.

Program Size Batch or Conversational Instructions / man-year
Ess Code 800,000 Batch 500-1000
7094 ESS Support 120,000 Batch 2100-3400
360 ESS Support 32,000 Conversational 8000
360 ESS Support 8,300 Batch 4000

Fig 12.2 Comparative productivity under batch and conversational programming

Further, we hear good testimonies from many who have built little systems or parts of systems in this way. The only numbers I have seen for effects on programming of large systems were reported by John Harr of Bell Labs. They are shown in Fig. 12.2. These numbers are for writing, assembling, and debugging programs. The first program is mostly control program; the other three are language translators, editors, and such. Harr's data suggest that an interactive facility at least doubles productivity in system programming.

The effective use of most interactive tools requires that the work be done in a high level language, for teletype and typewriter terminals cannot be used to debug by dumping memory. With a high level language, source can be easily edited and selective printouts easily done. Together they make a pair of sharp tools indeed.

OK, so maybe Fred's "I am convinced that interactive systems will never displace batch systems for many applications" statement isn't looking so hot in retrospect. I left that in for context. But he's right on target about the strong relationship between immediacy and productivity. Edit and Continue is a killer feature if I've ever seen one, and I can't wait to get my hands.. back.. on it.

Posted by Jeff Atwood    5 Comments

June 29, 2004

Commandos, Infantry, and Police

As I was driving home, I found myself thinking about a favorite section of the book Accidental Empires, by longtime computer journalist Robert X. Cringely. Originally published in 1993, it's getting a little long in the tooth, but it still contains a lot of great insights about the personalities that drove innovation in silicon valley-- from a guy who personally knew many of the players.

In the chapter "On The Beach", Cringely talks about the three distinct groups of people that define the lifetime of a company: Commandos, Infantry, and Police:

  1. Whether invading countries or markets, the first wave of troops to see battle are the commandos. Woz and Jobs were the commandos of the Apple II. Don Estridge and his twelve disciples were the commandos of the IBM PC. Dan Bricklin and Bob Frankston were the commandos of VisiCalc. Mitch Kapor and Jonathan Sachs were the commandos of Lotus 1-2-3. Commandos parachute behind enemy lines or quietly crawl ashore at night. A start-up's biggest advantage is speed, and speed is what commandos live for. They work hard, fast, and cheap, though often with a low level of professionalism, which is okay, too, because professionalism is expensive. Their job is to do lots of damage with surprise and teamwork, establishing a beachhead before the enemy is even aware that they exist. Ideally, they do this by building the prototype of a product that is so creative, so exactly correct for its purpose that by its very existence it leads to the destruction of other products. They make creativity a destructive act.

  2. Grouping offshore as the commandos do their work is the second wave of soldiers, the infantry. These are the people who hit the beach en masse and slog out the early victory, building on the start given them by the commandos. The second-wave troops take the prototype, test it, refine it, make it manufacturable, write the manuals, market it, and ideally produce a profit. Because there are so many more of these soldiers and their duties are so varied, they require an infrastructure of rules and procedures for getting things done -- all the stuff that commandos hate. For just this reason, soldiers of the second wave, while they can work with the first wave, generally don't trust them, though the commandos don't even notice this fact, since by this time they are bored and already looking for the door.

  3. What happens then is that the commandos and the infantry head off in the direction of Berlin or Baghdad, advancing into new territories, performing their same jobs again and again, though each time in a slightly different way. But there is still a need for a military presence in the territory they leave behind, which they have liberated. These third-wave troops hate change. They aren't troops at all but police. They want to fuel growth not by planning more invasions and landing on more beaches but by adding people and building economies and empires of scale. AT&T, IBM, and practically all other big, old, successful industrial companies are examples of third-wave enterprises. They can't even remember their first- and second-wave founders.

In my experience, this same distinction applies to software projects. You really need all three groups through the lifecycle of a project. Having the wrong group (commandos) at the wrong time (maintenance) can hurt you a lot more than it helps. Sometimes being a commando, even though it sounds really exciting, actually hurts the project.

The following is excerpted from pages 236-240 of Robert X Cringely's Accidental Empires:


There is an enormous difference between starting a company and running one. Thinking up great ideas, which requires mainly intelligence and knowledge, is much easier than building an organization, which also requires measures of tenacity, discipline, and understanding. Part of the reason that nineteen out of twenty high-tech start-ups end in failure must be the difficulty of making this critical transition from a bunch of guys in a rented office to a larger bunch of guys in a rented office with customers to serve. Customers? What are those? Think of the growth of a company as a military operation, which isn't such a stretch, given that both enterprises involve strategy, tactics, supply lines, communication, alliances, and manpower.

Whether invading countries or markets, the first wave of troops to see battle are the commandos. Woz and Jobs were the commandos of the Apple II. Don Estridge and his twelve disciples were the commandos of the IBM PC. Dan Bricklin and Bob Frankston were the commandos of VisiCalc. Mitch Kapor and Jonathan Sachs were the commandos of Lotus 1-2-3. Commandos parachute behind enemy lines or quietly crawl ashore at night. A start-up's biggest advantage is speed, and speed is what commandos live for. They work hard, fast, and cheap, though often with a low level of professionalism, which is okay, too, because professionalism is expensive. Their job is to do lots of damage with surprise and teamwork, establishing a beachhead before the enemy is even aware that they exist. Ideally, they do this by building the prototype of a product that is so creative, so exactly correct for its purpose that by its very existence it leads to the destruction of other products. They make creativity a destructive act.

For many products, and even for entire families of products, the commandos are the only forces that are allowed to be creative. Only they get to push the state of the art, providing creative solutions to customer needs. They have contact with potential customers, view the development process as an adventure, and work on the total product. But what they build, while it may look like a product and work like a product, usually isn't a product because it still has bugs and major failings that are beneath the notice of commando types. Or maybe it works fine but can't be produced profitably without extensive redesign. Commandos are useless for this type of work. They get bored.

I remember watching a paratrooper being interviewed on televison in Panama after the U.S. invasion. "It's not great," he said. "We're still here."

Sometimes commandos are bored even before the prototype is complete, so it stalls. The choice then is to wait for the commandos to regain interest or to find a new squad of commandos.

When 3Com Corp. was developing the first circuit card that would allow personal computers to communicate over Ethernet computer networks, the lead commando was Ron Crane, a brilliant, if erratic, engineer. The very future of 3Com depended on his finishing the Ethernet card on time, since the company was rapidly going broke and additional venture funding was tied to successful completion of the card. No Ethernet card, no money; no money, no company. In the middle of this high-pressure assignment, Crane just stopped working on the Ethernet card, leaving it unfinished on his workbench, and compulsively turned to finding a way to measure the sound reflectivity of his office ceiling tiles. That's the way it is sometimes when commandos get bored. Nobody else was prepared to take over Crane's job, so all his co-workers at 3Com could think to do in this moment of crisis was to wait for the end of his research, hoping that it would go well.

The happy ending here is that Crane eventually established 3Com's ceiling tile acoustic reflectivity standard, regained his Ethernet bearings, and delivered the breakthrough product, allowing 3Com to achieve its destiny as a $900 million company.

It's easy to dismiss the commandos. After all, most of business and warfare is conventional. But without commandos, you'd never get on the beach at all. Grouping offshore as the commandos do their work is the second wave of soldiers, the infantry. These are the people who hit the beach en masse and slog out the early victory, building on the start given them by the commandos. The second-wave troops take the prototype, test it, refine it, make it manufacturable, write the manuals, market it, and ideally produce a profit. Because there are so many more of these soldiers and their duties are so varied, they require an infrastructure of rules and procedures for getting things done-all the stuff that commandos hate. For just this reason, soldiers of the second wave, while they can work with the first wave, generally don't trust them, though the commandos don't even notice this fact, since by this time they are bored and already looking for the door.

The second wave is hardest to manage because they require a structure in which to work. While the commandos make success possible, it's the infantry that makes success happen. They know their niche and expend the vast amounts of resources it takes to maintain position, or to reposition a product if the commandos made too many mistakes. While the commandos come up with creative ways to hurt the enemy, giving the start-up its purpose and early direction, the infantry actually kill the enemy or drive it away, occupying the battlefield and establishing a successful market presence for the start-up and its product.

What happens then is that the commandos and the infantry head off in the direction of Berlin or Baghdad, advancing into new territories, performing their same jobs again and again, though each time in a slightly different way. But there is still a need for a military presence in the territory they leave behind, which they have liberated. These third-wave troops hate change. They aren't troops at all but police. They want to fuel growth not by planning more invasions and landing on more beaches but by adding people and building economies and empires of scale. AT&T, IBM, and practically all other big, old, successful industrial companies are examples of third-wave enterprises. They can't even remember their first- and second-wave founders.

Engineers in these established companies work on just part of a product, view their work as a job rather than an adventure, and usually have no customer contact. They also have no expectation of getting rich, and for good reason, because as companies grow, and especially after they go public, stock becomes a less effective employee motivator. They get fewer shares at a higher price, with less appreciation potential. Of course, there is also less risk, and to third-wave troops, this safety makes the lower reward worthwhile. It's in the transitions between these waves of troops that peril lies for computer start-ups. The company founder and charismatic leader of the Invasion is usually a commando, which means that he or she thrills to the idea of parachuting in and slashing throats but can't imagine running a mature organization that deals with the problems of customers or even with the problems of its own growing base of employees. Mitch Kapor of Lotus Development was an example of a commando/nice guy who didn't like to fire people or make unpopular decisions, and so eventually tired of being a chief executive, leaving at the height of its success the company he founded. First-wave types have trouble, too, accepting the drudgery that comes with being the boss of a high-tech start-up. Richard Leeds worked at Advanced Micro Devices and then Microsoft before starting his own small software company near Seattle. One day a programmer came to report that the toilet was plugged in the men's room. "Tell the office manager," Leeds said. "It's her job to handle things like that."

"I can't tell her," said the programmer, shyly. "She's a woman."

Richard Leeds, CEO, fixed the toilet.

The best leaders are experienced second-wave types who know enough to gather together a group of commandos and keep them inspired for the short time they are actually needed. Leaders who rise from the second wave must have both charisma and the ability to work with odd people. Don Estridge, who was recruited by Bill Lowe to head the development of the IBM PC, was a good second-wave leader. He could relate effectively to both IBM's third-wave management and the first-wave engineers who were needed to bring the original PC to market in just a year.

Apple chairman John Sculley is a third-wave leader of a second-wave company, which explains the many problems he has had over the years finding a focus for himself and for Apple. Sculley has been faking it.

When the leader is a third-wave type, the start-up is hardly ever successful, which is part of the reason that the idea of intraprmeurlsm-a trendy term for starting new companies inside larger, older companies-usually doesn't work. The third-wave managers of the parent company trust only other third-wave managers to run the start-up, but such managers don't know how to attract or keep commandos, so the enterprise generally has little hope of succeeding. This trend also explains the trouble that oldfine computer companies have had entering the personal computer business. These companies can see only the big picture - way that PCs fit into their broad product line of large and small computers. They concentrate more on fitting PCs politely into the product line than on kicking ass in the market, which is the way successes are built. A team from Unisys Corp. dropped by InfoWorld one day to brag about the company's high-end personal computers. The boxes were priced at around $30,000, not because they cost so much to build but because setting the price any lower might have hurt the bottom end of Unisys's own line of minicomputers. Six miles away, at Fry's Electronics, the legendary Silicon Valley retailer that sells a unique combination of computers, junk food, and personal toiletry items, a virtually identical PC costs less than $3,000. Who buys Unisys PCs? Nobody.

Then Bob Kavner came to town, head of AT&T's computer operation and the guy who invested $300 million of Ma Bell's money in Sun Microsystems and then led AT&T's hostile acquisition of NCR-yet another company that didn't know its PC from a hole in the ground. Eating a cup of yogutt, Kavner asked why we gave his machines such bad scores in our product reviews. We'd tested the machines alongside competitors' models and found that the Ma Bell units were poorly designed and badly built. They compared poorly, and we told him so. Kavner was amazed, both by the fact that his products were so bad and to learn that we ran scientific tests; he thought it was just an InfoWorld grudge against AT&T. Here's a third-wave guy who was concentrating so hard on what was happening inside his own organization that he wasn't even aware of how that organization fit into the real world or, for that matter, how the real world even worked. No wonder AT&T has done poorly as a personal computer company.

(c) 1993 Robert X. Cringely

Posted by Jeff Atwood    1 Comments

June 28, 2004

Hungarian Wars

I've found a number of blog posts about the pros and cons of Simonyi's Hungarian Notation, most notably, this blog post commenting on the extreme polarity of the reprinted MSDN article rating:

msdn article score graph

This single image really cuts to the heart of the debate, pointedly illustrating what a religious war this topic is.

Coming from a traditional VB background, with our txts, our frms, and our strs and ints, I was befuddled when presented with .NET-- what naming scheme do you use for a fully OO language where.. everything is an object? objEverything isn't very satisfying. So, you start to question whether the naming scheme ever made any sense at all.

After a lot of thought, and a lot of hand-wringing, here are the conventions I ultimately settled on for my .NET development. I'm not proposing these as a standard, merely documenting the thought process that goes into coherent variable naming:

  1. Most functions should be short enough that you won't have a zillion variables. If you have that many variables to tell apart, you have bigger fish to fry.
  2. I want to be able to tell "simple" intrinsic types from full blown objects at a glance*. This distinction is important to me. Yeah, they're all still objects, but there are the common simple variables types we use 99% of the time (eg, String and Integer), and then there's everything else.
  3. I want to be able to tell class level variables from local variables at a glance*. How far up do I need to scroll?
  4. The variable names should be descriptive, readable and succinct.
  5. I do not believe every single object needs a unique prefix. This is insane, and as the VB6 document illustrates, this way leads madness..

* At a glance means without having to mouse or cursor over the variable name, eg, it should work even in the high tech Notepad IDE.

If there is a theme here, it is simplicity and readability. The other theme is that Hungarian Notation seems to have somehow evolved into a catch-all term for "Here's the variable naming convention we use on our team." It's like Linux: there are umpteen zillion "distros" out there, all slightly different flavors of the same basic theme. Here's what my flavor looks like:

Public Class Class1

    Public _strCustomerName as String

    Public Function GetCustomerFields(ByVal intCustomerID As Integer) As Specialized.NameValueCollection
        Dim nvc As New Specialized.NameValueCollection
        Dim ds As New Data.DataSet
        Dim dr As Data.DataRow

        For Each dr In ds.Tables(0).Rows
            nvc.Add(dr.Item("name"), dr.Item("value"))
        Next

        Return nvc
    End Function

End Class

The numbered list above documents the rationale (or lack thereof) behind this. You can see where I totally punted on the concept of object prefixes in a fully object oriented language. So many of objects I create are "one off", with such a limited lifetime and such an obvious, scoped usage that I don't feel the need to give them unique names. Does it really help to call the dataset dsCustomers in this case? I don't think so. Keep it short and sweet.

Ultimately, as in the MSDN rating, naming conventions are kind of personal. Pointing out how stupid someone's variable names are is like telling them how stupid they are for naming their first born child "Melvin."

On the other hand, I do think it is rude to enter a development team and arbitrarily decide to settle on "the best" conventions; deciding what conventions to adopt is certainly a topic worth broaching in a team developer meeting, but it's also just plain good manners: when in Rome, do as the Romans do. In the end, it's more important to be internally consistent with a naming standard than it is to spend a lot of time sussing out some kind of perfect, interplanetary naming standard that will never be definitively decided to anyone's satisfaction anyway. Pick a reasonable, basic set of standards that most can agree on, but leave room for personal interpretations, too. There's nothing quite as soul crushing as over-standardizing in a religious area where there really isn't a "right" answer.

In closing, it is evident that the conventions participated in making the code more correct, easier to write, and easier to read. Naming conventions cannot guarantee good code, however; only the skill of the programmer can.
-- Charles Simonyi

Posted by Jeff Atwood    14 Comments

Visual Diff Tools

I'm currently building a .NET library that constructs .MHT files, aka single file web page archives. That's what you get when you perform a File | Save As | Web Archive, Single File operation in IE6. HTML is a great, standard format for building richly formatted one-off reports, but once you start including images, it becomes a pain to manage a set of files. Thus, the utility of combining everything into a single file.

Surprisingly, instead of some crazy proprietary Microsoft format like you'd expect, the file follows the simple Multipart MIME Message RFC standard. Building an .MHT file is sort of like sending an email to yourself-- go figure. It also works in via extension in your precious Firefox, for those of you that enjoy slow rendering.

During development, I needed to reverse engineer what IE6 constructs, and use that as a comparison point for the output from my application. Unfortunately, the only file comparison tool I had access to was the crappy default "compare versions" function in Visual SourceSafe. It's workable, but it's kind of.. ghetto.

Every developer should have a good diff tool in their toolkit. After a bit of research, I settled on Araxis Merge as my preferred tool for visual comparisons.

screenshot of araxis merge

It's a pricey tool, but it's come in very handy so far. The only regret I have is that VSS doesn't allow the use of any external comparison tools, so you can't integrate Merge with Visual Studio .NET.

Anyway, if like me, the only diff tool you ever used was the one in VSS-- you may not know how much you're missing.

Posted by Jeff Atwood    5 Comments

June 27, 2004

Code Complete 2: The Revenge

Reading through this blog, I was just reminded that Code Complete 2* was released. Since this book is the first entry on my prioritized list of Recommended Reading for Developers, and Steve is the patron saint of this web site, you better believe I just placed an order for it! Also, if you don't own the first five books on that reading list, shame on you, and get your ass over to Amazon immediately. If I was rich enough to buy a copy of those books for every developer on earth, I would.

I already own two copies of the original Code Complete; one for work, one for home. There's a list of what changed in the new edition, if you're curious:

There are still far more people who talk about good practices than who actually use good practices. I see far too many people using current buzzwords as a cloak for sloppy practices. When the first edition was published, people were claiming, "I don't have to do requirements or design because I'm using object-oriented programming." That was just an excuse. Most of those people weren't really doing object-oriented programming -- they were hacking, and the results were predictable, and poor. Right now, people are saying "I don't have to do requirements or design because I'm doing agile development." Again, the results are easy to predict, and poor.

Testing guru Boris Beizer said that his clients ask him, "How can I revolutionize and transform my software development without changing anything except the names and putting some slogans up on the walls?" (Johnson 1994b). Good programmers invest the effort to learn how to use current practices. Not-so-good programmers just learn the buzzwords, and that's been a software industry constant for a half century.

Steve is also writing a new book on estimation, Software Estimation: Demystifying the Black Art, or what I like to call the "I'm 99% done except for some cleanup!" book. It won't be released until sometime next year, but he is providing public manuscripts for review comments.

* I have a running joke that every sequel should always be subtitled The Revenge for dramatic effect, hollywood style. You know, like Bridges of Madison County 2: The Revenge

Posted by Jeff Atwood    10 Comments

Death to the Dialog Box

One of the unnecessary evils of GUI programming is the "Process Dialog Box", what we think of as MessageBox.Show. You know, like this:

modal dialog screenshot

All kidding aside, these dialogs are frequently abused for displaying all kinds of trivial information to the user, a mistake that Alan Cooper calls stopping the proceedings with idiocy. Don't like the data the user entered into a form? Well then, let's immediately pop up a MessageBox and notify them about it! Thus the main form loses focus, and the user has another modal window to to acknowledge before s/he can continue doing anything with the main form. This completely breaks any flow of interaction the user had with our app. A better solution is to passively flag the field-- perhaps paint it with a pink background, or use the web metaphor of the red asterisk placed to the right of the field. Whatever you do, avoid stopping the proceedings with idiocy at all costs.

But even when following that guideline religiously, you'll still find yourself painted into corners where you really, really need to let the user know that something happened. Right now. And the current GUI toolkit is woefully inadequate for expressing this to the user. What are my options? Display something in the status bar? The previous versions of IE6 did it exactly that way, at least for certain classes of errors such as javascript errors on the page. However, one of the interesting side effects of installing Windows XP SP2* RC2 is that it adds non-dialog based notifications to Outlook Express and IE6. For example, here's IE6 notifying me that it blocked download of that crazy, dangerous Firefox browser-- a clear security risk!

IE screenshot

I love this solution, and I want someone to copy it immediately and make it available as a WinForms user control! There's just no question that this is a far better solution than popping a modal dialog with the same information. It's also better than the "put an icon in the status bar" solution, because it's more visible, it's at the top of the window where the work starts (nobody sees the status bar), and it contains more information. You can click it to get a menu of actions relevant to the condition, in this case, unblocking the download or turning off the nofication entirely per-site or per-system.

It's funny, because I had often considered this dialog box conundrum-- which is really endemic to all GUIs-- and thought back to the interface from an old computer game from 1999, Dungeon Keeper 2. The game was constantly sending you notifications of various things going on throughout your dungeon; the notifications would visually flow into a queue with a summary icon to indicate the type and severity of the notification. That way you could continue playing the game without interruption, and process the messages as you deemed necessary.

* AKA the "gee, we're sick of getting all this bad publicity about our crappy default security settings" patch.

Posted by Jeff Atwood    17 Comments

June 26, 2004

UNIX will never be usable

A few months ago, Eric Raymond, the open source guru best known for his seminal paper The Cathedral and the Bazaar, posted a rant about the difficulty he encountered with a common user printing scenario in Unix.

The followup post is even more intriguing:

I am informed that an RFE covering the issues I raised has been registered on Red Hat Bugzilla. But quibbles over who is responsible for which piece of the CUPS-configuration mess are, as the letters above reinforce, not merely beside the point but evasions of the actual problem, which is a systemic one that affects thousands of other projects and our entire community.

Up to now, we haven't been willing to do the real work of making our software usable. It doesn't matter whether the the failure of the browsing defaults in CUPS to match the documentation was a CUPS-team screwup or a Fedora screwup -- Aunt Tillie doesn't care which direction that finger points, and I don't either. No, the real problem is that whoever changed the default didn't immediately fix the documentation to match it as a matter of spinal reflex.

It also doesn't matter a damn whether the shoddy and unhelpful design of the printer-configuration tool came out of a CUPS brainpan or a Fedora brainpan. What matters is that whoever was responsible never audited the interface for usability with a real user.

The CUPS mess is not a failure of one development team, or of one distribution integrator. In fact, it makes a better example because the CUPS guys and the Fedora guys are both well above the median in both general technical chops, design smarts, and attention to usability. The fact that this mess is an example of our best in action, rather than our worst, just highlights how appallingly low our standards have been.

It's time for that to change. And the really heartening thing I got from the community response is that maybe we're ready for it to change. "I thought it was just me" -- many, many of you out there are already dissatisfied with the poor quality of open-source UIs. but each of you has tended to think you were alone. No longer. It's time for each and every one of you out there to become public champions for the luxury of ignorance.

Good UI design is not a result of black magic, it just requires paying attention. Being task-oriented rather than feature-oriented. Recognizing that every time you force a user to learn something, you have fallen down on your job. And that when Aunt Tillie doesn't understand your software, the fault -- and the responsibility to fix it -- lies not with her but with you.

However well intentioned this observation is, and quite frankly, how obvious it is-- at least, to everyone outside the insular UNIX community-- I think Eric is barking up the wrong tree. UNIX will never be usable. It is awfully late in the game for the UNIX crowd to suddenly realize what other computer users have intuitively known since, say, 1984 and the introduction of the Macintosh: nobody gives a damn how technically competent your code is when they can't figure out how to use it. Without usability you have nothing.
It's been twenty years since the GNU Manifesto and nearly seven since The Cathedral and the Bazaar. I think it's time we stopped congratulating ourselves quite so much on our dedication to freedom and our ability to write technically superior code, and began more often to ask What are we doing to serve the real users? Good UI design, and doing the right thing by Aunt Tillie, ought to be a matter of gut-level pride of craftsmanship.
I think it is comically unrealistic to ask a community predicated on C code, kernel hacking, and the utility of command line tools, to suddenly wake up and get the usability religion. It just ain't gonna happen, because usability is not a part of the fabric of their culture. The open source and unix guys have had almost thirty years to come up with a usable GUI; why should history lead me to believe the next five years are going to be any different?

Usability is easily an order of magnitude harder than writing technically competent code, even harder than writing your own operating system kernel. You have to understand what users are actually doing, versus what they say they are doing. Open-source developers don't have time for things that are a pain in the ass-- like users, their conflicting needs, and their general disdain for computers and technology. They want to work on "the fun stuff", which doesn't include users pestering them every day. RTFM!

I expect to see usability enhancements from the companies which have cultivated a culture that respects usability-- primarily Microsoft and Apple. I'd love to see more usability in UNIX and open source, and I am encouraged by this sudden influx of concern, but I won't be holding my breath.

Related articles:

Posted by Jeff Atwood    15 Comments

June 25, 2004

Posted by Jeff Atwood    2 Comments

June 24, 2004

What's worse than a Bad Error Message?

I'm sure I don't have to explain what is wrong with error messages like this:

Catastrophic Failure
General Protection Fault
Error: The operation completed successfully.
But as bad as those are, they pale in comparsion to what is, hands down, the worst kind of error message: a beautiful, well-formatted, informative, incorrect error message.

Due to the issue documented in my previous post, we're currently replacing the database layer of our production application-- switching from Microsoft's System.Data.OracleClient, to Oracle's Oracle.DataAccess. Just what you want to do in a production system, make sweeping changes in the back end soon after deployment. Er, right. But I digress.

The initial conversion went better than expected, and ran fine on development machines within a few hours. However, when we deployed our Smart Client app, we encountered the following exception:

(Inner Exception)

Exception Source:      Oracle.DataAccess
Exception Type:        System.DllNotFoundException
Exception Message:     Unable to load DLL (OraOps9.dll).
Exception Target Site: GetRegTraceInfo

---- Stack Trace ----
   Oracle.DataAccess.Client.OpsTrace.GetRegTraceInfo(TrcLevel As UInt32&)
       CrazyApp.Loader.EXE: N 00000
   Oracle.DataAccess.Client.OracleConnection..ctor()
       CrazyApp.Loader.EXE: N 00032
   SharedUtils.DB.DBDataset..ctor(info As SerializationInfo, context As StreamingContext)
       CrazyApp.Loader.EXE: N 00040
       
(Outer Exception)

Exception Source:      mscorlib
Exception Type:        System.Reflection.TargetInvocationException
Exception Message:     Exception has been thrown by the target of an invocation.
Exception Target Site: HandleReturnMessage

---- Stack Trace ----
   System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(reqMsg As IMessage, retMsg As IMessage)
       CrazyApp.Loader.EXE: N 00264
   System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(msgData As MessageData&, type As Int32)
       CrazyApp.Loader.EXE: N 00682
   CrazyApp.API.UserManager.GetUser(dsUser As DataSet&)
       CrazyApp.Loader.EXE: N 00000
   CrazyApp.UI.Data.ClientDatasetManager.GetCurrentUserDataset(blnForceRefresh As Boolean)
       CrazyApp.Loader.EXE: N 00081

Thus began an entire day of hair-pulling exercises in determining why the remoted Oracle call can't locate OraOps9.dll. It has to be a configuration problem on the server with the Oracle driver. Just like the nicely formatted error message says, with its informative stack traces and exception details. Right?

Wrong. After exhausting every possible scenario-- I wish I could say it was skill, but it's a lot more like dogged trial and error-- we determined that, despite the fact that the exception is wrapped in a remoting call, the required file is missing from the client!

I discovered this on my own machine. Intellectually, I knew there was no way I could be getting different results from a server call than any other client. The only possible explanation was a new client dependency introduced by referencing types in Oracle.DataAccess. But I still refused to believe this. In fact, I did not believe it until I duplicated it, by installing the Oracle 9 client and .NET layer on a clean build machine. Sure enough, the smart client app ran fine as soon as I did that.

I've probably spent more time chasing down erroneous error messages than the time I've spent on all other error messages combined. Evidently computers, like people, are big fat stinkin' liars!

Posted by Jeff Atwood    6 Comments

June 23, 2004

Debugging ASPNET_WP in Production

One of our production web servers keeps deadlocking the ASPNET_WP process, like so:

aspnet_wp.exe (PID: 3588) was recycled because it was suspected to be in a deadlocked state. It did not send any responses for pending requests in the last 180 seconds.

This is painful. It means the server becomes unvailable for over three minutes, and any pending requests return errors after ASPNET_WP is cycled. The best part is, this happens completely randomly. We can't force it to happen or duplicate it, we just have to wait for it to happen. And it inevitably does, several times per day. We went through all the normal troubleshooting procedures and exhausted them all, which left.. the tough stuff.

Luckily for us, Microsoft has an excellent article, Production Debugging for .NET Framework Applications which goes into excruciating detail on how to deal with this situation. In other words, you bring out the big guns:

The article contains an excellent walkthrough, but here's the reader's digest version of what you need to do

  1. Install the above tools on the web server with the problem. Unzip the dbgnetfx.exe contents to the debugging tools folder.
  2. use the command line tool adplus.vbs -hang -p ASPNET_WP to generate a memory dump of the ASPNET_WP process. This will create a folder containing a fairly large file (mine was ~90mb) inside the debugging tools folder. This can be kind of a pain, because you have to trigger this after the crash or during the hang (as in my case). The adplus_aspnet.vbs file has some special functionality to "kick in" automatically during crash or hang scenarios.
  3. Fire up the windbg.exe application, and open the crash dump file via the drop-down menus. You will need to set the symbol paths (most importantly, including Microsoft's public http:// symbol server URL) as listed in the document; scroll down to the section titled "To enter the symbol paths, do one of the following:". The windbg app has a command line entry area at the bottom, near the status bar, so that's where you want to enter those symbol path commands.
  4. At this point skip directly to the .NET specific debugging information, which relies on the windbg add in "sos.dll". That's contained in the dbgnetfx.exe archive. Scroll down to .load SOS\sos.dll (er, "son of strike"? I want some of what they're smoking at MS!) and proceed from there.

Once you've gone through all that rigamarole, you actually get some useful, .NET specific information, such as all the thread info:

0:000> !threads
ThreadCount: 23
UnstartedThread: 0
BackgroundThread: 23
PendingThread: 0
DeadThread: 0
                             PreEmptive   GC Alloc               Lock     
       ID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  1  1050 0013cc48   200a220 Enabled  05544368:05545054 0020fe78     1 MTA
  8  1090 0014ca30      b220 Enabled  00000000:00000000 001400f0     0 MTA (Finalizer)
 10   b54 00158f60   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
  4   770 0019d8a8   2000220 Enabled  0553c374:0553d054 0020fe78     1 MTA
  9  10a0 001c1308   2000220 Enabled  016118ac:01612568 0020fe78     1 MTA
 11   d30 001c1800   2000220 Enabled  0554238c:05543054 0020fe78     1 MTA
 12  104c 001c1d70   2000220 Enabled  0160f8ac:01610568 0020fe78     1 MTA
 14  102c 001ffe50   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
 15   3c8 0ebf4488   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
 16   aa0 0ec39468   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
 18   fd8 001c1b80   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
 19  1040 001c1640   1800220 Enabled  00000000:00000000 001400f0     0 MTA (Threadpool Worker)
 20  101c 001c19c0   2000220 Enabled  05546398:05547054 0020fe78     1 MTA
 21  1044 107e4a08   2000220 Enabled  0554a380:0554b054 0020fe78     1 MTA
 22   ea8 107d8b80   2000220 Enabled  01613864:01614568 0020fe78     1 MTA
 23   d28 0ec8bef0   2000220 Enabled  05540380:05541054 0020fe78     1 MTA
 24   7c8 0ec8cbe0   2000220 Enabled  05548374:05549054 0020fe78     1 MTA
 25  1084 1085ebb8   2000220 Enabled  0160b8b8:0160c568 0020fe78     1 MTA
 26  1034 0ec8d7d8   2000220 Enabled  0160d8ac:0160e568 0020fe78     1 MTA
 27   804 107ae008   2000220 Enabled  016098b8:0160a568 0020fe78     1 MTA
 28   c20 107aecf8   2000220 Enabled  01607894:01608568 0020fe78     1 MTA
 29   ea4 1089f3d0   2000220 Enabled  0553e3a4:0553f054 0020fe78     1 MTA
 30   d88 108a0340       220 Enabled  00000000:00000000 001400f0     0 MTA

Of the 32 threads, 14 are associated with the AppDomain for W3SVC5, which I know because I compared the !dumpdomain (domainid) output for the value 0020fe78.

OK, so we know we have a lot of blocked threads associated with our website, which we.. already sort of knew. Wouldn't it be helpful if we knew.. exactly what .NET commands these threads were issuing?

0:000> ~*e !clrstack

Thread 4
ESP       EIP     
00fbf394  77f8287e [FRAME: ECallMethodFrame] [DEFAULT] I4 System.Threading.WaitHandle.WaitMultiple(SZArray Class System.Threading.WaitHandle,I4,Boolean,Boolean)
00fbf3ac  799f1171 [DEFAULT] I4 System.Threading.WaitHandle.WaitAny(SZArray Class System.Threading.WaitHandle,I4,Boolean)
00fbf3c0  0ebe6410 [DEFAULT] [hasThis] Class System.Data.OracleClient.IDBPooledObject System.Data.OracleClient.DBObjectPool.GetObject(ByRef Boolean)
00fbf3f0  0ebe5486 [DEFAULT] Class System.Data.OracleClient.OracleInternalConnection System.Data.OracleClient.OracleConnectionPoolManager.GetPooledConnection(String,Class System.Data.OracleClient.OracleConnectionString,ByRef Boolean)
00fbf40c  0ebe50fa [DEFAULT] [hasThis] Void System.Data.OracleClient.OracleConnection.OpenInternal(Class System.Data.OracleClient.OracleConnectionString,Object)
00fbf448  0ebe5011 [DEFAULT] [hasThis] Void System.Data.OracleClient.OracleConnection.Open()
00fbf454  0fa31977 [DEFAULT] [hasThis] Void SharedUtils.DB.DBDataset.Fill(ByRef Class System.Data.OracleClient.OracleCommand,String)
  at [+0x6f] [+0x26]
00fbf48c  0fa330e3 [DEFAULT] [hasThis] Void SharedUtils.DB.DBDataset.Fill(ByRef Class System.Data.OracleClient.OracleCommand,String,String)
  at [+0x23] [+0x10]
00fbf4a0  0fa32a2f [DEFAULT] [hasThis] Void CrazyApp.API.Library.GetTreeForContainer(I4,ByRef Class System.Data.DataSet,String,String)
  at [+0x12f] [+0xa3]

Thread 9
ESP       EIP     
0dddf3f4  77f8287e [FRAME: ECallMethodFrame] [DEFAULT] I4 System.Threading.WaitHandle.WaitMultiple(SZArray Class System.Threading.WaitHandle,I4,Boolean,Boolean)
0dddf40c  799f1171 [DEFAULT] I4 System.Threading.WaitHandle.WaitAny(SZArray Class System.Threading.WaitHandle,I4,Boolean)
0dddf420  0ebe6410 [DEFAULT] [hasThis] Class System.Data.OracleClient.IDBPooledObject System.Data.OracleClient.DBObjectPool.GetObject(ByRef Boolean)
0dddf450  0ebe5486 [DEFAULT] Class System.Data.OracleClient.OracleInternalConnection System.Data.OracleClient.OracleConnectionPoolManager.GetPooledConnection(String,Class System.Data.OracleClient.OracleConnectionString,ByRef Boolean)
0dddf46c  0ebe50fa [DEFAULT] [hasThis] Void System.Data.OracleClient.OracleConnection.OpenInternal(Class System.Data.OracleClient.OracleConnectionString,Object)
0dddf4a8  0ebe5011 [DEFAULT] [hasThis] Void System.Data.OracleClient.OracleConnection.Open()
0dddf4b4  0fa31977 [DEFAULT] [hasThis] Void SharedUtils.DB.DBDataset.Fill(ByRef Class System.Data.OracleClient.OracleCommand,String)
  at [+0x6f] [+0x26]
0dddf4ec  0fa330e3 [DEFAULT] [hasThis] Void SharedUtils.DB.DBDataset.Fill(ByRef Class System.Data.OracleClient.OracleCommand,String,String)
  at [+0x23] [+0x10]
0dddf500  0fa36a3c [DEFAULT] [hasThis] Class CrazyApp.API.Node.Document CrazyApp.API.Library.GetDocument(I4)
  at [+0x7c] [+0x32]

I have changed the name of our application to "CrazyApp" to protect the guilty, and I have simplified the dump to only two of the 14 threads. Based on these thread command lists, it now very clear what is going on here: we're blocking while waiting for database resources via the System.Data.OracleClient.DBObjectPool.GetObject command, on every single thread!

Armed with this information, rather than "gee, ASPNET_WP is deadlocking a lot", we were able to determine that the real problem is A pooled connection is not disposed by Microsoft .NET Managed Provider for Oracle when an exception occurs. There are a lot of people on the newsgroups complaining about the same thing, namely, that the Microsoft System.Data.OracleClient is blindly re-using connections that it knows to be bad, which of generates a NullObjectException, and sooner or later-- basically at random-- causes ASPNET_WP to fall over and cycle.

Good times.. good times..

Posted by Jeff Atwood    2 Comments
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.