I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


13 posts from February 2009

February 28, 2009

File Compression in the Multi-Core Era

I've been playing around a bit with file compression again, as we generate some very large backup files daily on Stack Overflow.

We're using the latest 64-bit version of 7zip (4.64) on our database server. I'm not a big fan of more than dual core on the desktop, but it's a no brainer for servers. The more CPU cores the merrier! This server has two quad-core CPUs, a total of 8 cores, and I was a little disheartened to discover that neither RAR nor 7zip seemed to make much use of more than 2.

Still, even if it does only use 2 cores to compress, the 7zip algorithm is amazingly effective, and has evolved over the last few years to be respectably fast. I used to recommend RAR over Zip, but given the increased efficiency of 7zip and the fact that it's free and RAR isn't, it's the logical choice now.

Here are some quick tests I performed compressing a single 4.76 GB database backup file. This was run on a server with dual quad-core 2.5 GHz Xeon E5420 CPUs.

7zipfastest5 min14 MB/sec973 MB
7zipfast7 min11 MB/sec926 MB
7zipnormal34 min2.5 MB/sec752 MB
7zipmaximum41 min2.0 MB/sec714 MB
7zipultra48 min1.7 MB/sec698 MB

For those of you who are now wondering, wow, if 7zip does this well on maximum and ultra, imagine how it'd do on ultra-plus, don't count on it. There's a reason most compression programs default to certain settings as "normal". Above these settings, results tend to fall off a cliff; beyond that sweet spot, you tend to get absurdly tiny increases in compression ratio in exchange for huge order of magnitude increases in compression time.

Now watch what happens when I switch 7zip to use the bzip2 compression algorithm:

7zip with bzip2 selected

We'll compress that same 4.76 GB file, on the same machine:

bzip2fastest2 min36 MB/sec1092 MB
bzip2fast2.5 min29 MB/sec1011 MB
bzip2normal3.5 min22 MB/sec989 MB
bzip2maximum7 min12 MB/sec987 MB
bzip2ultra21 min4 MB/sec986 MB

Why is bzip2 able to work so much faster than 7zip? Simple:

7zip algorithm CPU usage

7zip multithreaded cpu usage

bzip2 algorithm CPU usage

bzip2-multithreaded-cpu-usage.png

Bzip2 uses more than 2 CPU cores to parallelize its work. I'm not sure what the limit is, but the drop-down selector in the 7zip GUI allows up to 16 when the bzip2 algorithm is chosen. I used 8 for the above tests, since that's how many CPU cores we have on the server.

Unfortunately, bzip2's increased speed is sort of moot at high compression levels. The difference between normal, maximum, and ultra compression is a meaningless 0.06 percent. It scales beautifully in time, but hardly at all in space. That's a shame, because that's exactly where you'd like to spend the speed increase of paralellization. Eking out a percent of size improvement could still make sense, depending on the circumstances:

total time = compression time + n * (compressed file size / network speed + decompression time)

For instance, if you compress a file to send it over a network once, n equals one and compression time will have a big influence. If you want to post a file to be downloaded many times, n is big so long compression times will weigh less in the final decision. Finally, slow networks will do best with a slow but efficient algorithm, while for fast networks a speedy, possibly less efficient algorithm is needed.

On the other hand, the ability to compress a 5 GB source file to a fifth of its size in two minutes flat is pretty darn impressive. Still, I can't help wondering how fast the 7zip algorithm would be if it was rewritten and parallelized to take advantage of more than 2 CPU cores, too.

Posted by Jeff Atwood    88 Comments

February 27, 2009

Paying Down Your Technical Debt

Every software project I've ever worked on has accrued technical debt over time:

Technical Debt is a wonderful metaphor developed by Ward Cunningham to help us think about this problem. In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.

The metaphor also explains why it may be sensible to do the quick and dirty approach. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline. The all too common problem is that development organizations let their debt get out of control and spend most of their future development effort paying crippling interest payments.

No matter how talented and smart the software developers, all these tiny deferments begin to add up and cumulatively weigh on the project, dragging it down. My latest project is no different. After six solid months working on the Stack Overflow codebase, this is exactly where we are. We're digging in our heels and retrenching for a major refactoring of our database. We have to stop working on new features for a while and pay down some of our technical debt.

credit cards

I believe that accruing technical debt is unavoidable on any real software project. Sure, you refactor as you go, and incorporate improvements when you can -- but it's impossible to predict exactly how those key decisions you made early on in the project are going to play out. All you can do is roll with the punches, and budget some time into the schedule to periodically pay down your technical debt.

The time you take out of the schedule to make technical debt payments typically doesn't result in anything the customers or users will see. This can sometimes be hard to justify. In fact, I had to defend our decision with Joel, my business partner. He'd prefer we work on some crazy thing he calls revenue generation, whatever that is.

Steve McConnell has a lengthy blog entry examining technical debt. The perils of not ackowledging your debt are clear:

One of the important implications of technical debt is that it must be serviced, i.e., once you incur a debt there will be interest charges. If the debt grows large enough, eventually the company will spend more on servicing its debt than it invests in increasing the value of its other assets. A common example is a legacy code base in which so much work goes into keeping a production system running (i.e., "servicing the debt") that there is little time left over to add new capabilities to the system. With financial debt, analysts talk about the "debt ratio," which is equal to total debt divided by total assets. Higher debt ratios are seen as more risky, which seems true for technical debt, too.

Beyond what Steve describes here, I'd also argue that accumulated technical debt becomes a major disincentive to work on a project. It's a collection of small but annoying things that you have to deal with every time you sit down to write code. But it's exactly these small annoyances, this sand grinding away in the gears of your workday, that eventually causes you to stop enjoying the project. These small things matter.

It can be scary to go in and rebuild a lot of working code that has become crufty over time. But don't succumb to fear.

I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.

When it comes time to pay down your technical debt, don't be afraid to break stuff. It's liberating, even energizing to tear down code in order to build it up stronger and better than it was before. Be brave, and realize that paying your technical debt every so often is a normal, necessary part of the software development cycle to avert massive interest payments later. After all, who wants to live forever?

Posted by Jeff Atwood    82 Comments

February 25, 2009

Who's Your Coding Buddy?

I am continually amazed how much better my code becomes after I've had a peer look at it. I don't mean a formal review in a meeting room, or making my code open to anonymous public scrutiny on the internet, or some kind of onerous pair programming regime. Just one brief attempt at explaining and showing my code to a fellow programmer -- that's usually all it takes.

This is, of course, nothing new. Karl Wiegers' excellent book Peer Reviews in Software: A Practical Guide has been the definitive guide since 2002.

Peer Reviews in Software: a Practical Guide

I don't think anyone disputes the value of having another pair of eyes on your code, but there's a sort of institutional inertia that prevents it from happening in a lot of shops. In the chapter titled A Little Help from Your Friends (pdf), Karl explains:

Busy practitioners are sometimes reluctant to spend time examining a colleague's work. You might be leery of a coworker who asks you to review his code. Does he lack confidence? Does he want you to do his thinking for him? "Anyone who needs his code reviewed shouldn't be getting paid as a software developer," scoff some review resisters.

In a healthy software engineering culture, team members engage their peers to improve the quality of their work and increase their productivity. They understand that the time they spend looking at a colleague's work product is repaid when other team members examine their own deliverables. The best software engineers I have known actively sought out reviewers. Indeed, the input from many reviewers over their careers was part of what made these developers the best.

In addition to the above chapter, you can sample Chapter 3 (pdf) courtesy of the author's own Process Impact website. This isn't just feel-good hand waving. There's actual data behind it. Multiple studies show code inspections are startlingly effective.

the average defect detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 and 60 percent.

So why aren't you doing code reviews? Maybe it's because you haven't picked out a coding buddy yet!

Remember those school trips, where everyone was admonished to pick a buddy and stick with them? This was as much to keep everyone out of trouble as safe. Well, the same rule applies when you're building software. Before you check code in, give it a quick once-over with your buddy. Can you explain it? Does it make sense? Is there anything you forgot?

I am now required by law to link to this cartoon.

the only valid measurement of code quality: WTFs per minute

Thank you, I'll be here all week.

But seriously, this cartoon illustrates exactly the kind of broad reality check we're looking for. It doesn't have to be complicated to be effective. WTFs/minute is a perfectly acceptable unit of measurement to use with your coding buddy. The XP community has promoted pair programming for years, but I think the buddy system is a far more practical way to achieve the same results.

Besides, who wouldn't want to be half of an awesome part-time coding dynamic duo?

Batman and Robin

That's way more exciting than the prospect of being shackled to the same computer with another person. Think about all the other classic dynamic duos out there:

Individuals can do great things, but two highly motivated peers can accomplish even more when they work together. Surely there's at least one programmer you work with who you admire or at least respect enough to adopt the buddy system with. (And if not, you might consider changing your company.)

One of the great joys of programming is not having to do it alone. So who's your coding buddy?

Posted by Jeff Atwood    132 Comments

February 23, 2009

Rate Limiting and Velocity Checking

Lately, I've been seeing these odd little signs pop up in storefronts around town.

7-11 rate limiter

All the signs have various forms of this printed on them:

Only 3 students at a time in the store please

We took that picture at a 7-11 convenience store which happens to be near a high school, so maybe the problem is particularly acute there. But even farther into town, the same signs appear with disturbing regularity. I'm guessing the store owners must consider these rules necessary because:

  • teenage students are more likely to shoplift than most customers
  • with many teenage students in the store, it's difficult for the owners to keep an eye on everyone, which further increases the likelihood of shoplifting.

I'm just guessing; I don't own a store. But like the "no elephants" sign, it must be there to address a real problem.

When you go into a restaurant and see a sign that says "No Dogs Allowed," you might think that sign is purely proscriptive: Mr. Restaurant doesn't like dogs around, so when he built the restaurant he put up that sign. If that was all that was going on, there would also be a "No Snakes" sign; after all, nobody likes snakes. And a "No Elephants" sign, because they break the chairs when they sit down. The real reason that sign is there is historical: it is a historical marker that indicates that people used to try to bring their dogs into the restaurant

All these signs are enough to make me question the ethics of high school students in groups of 3 or more. Although, to be fair, I've seen some really shifty looking graduate students in my day.

In truth, these kinds of limits are everywhere; they're just not as obvious because there's often no signage trail to follow.

  • Most ATMs only allow you to withdraw $300 cash maximum in one day.
  • Free email accounts typically limit how many emails can be sent per day.
  • Internet providers limit individual download and upload speeds to ensure they aren't overselling their bandwidth.
  • There's a maximum on how many Xbox Live Points you can add to your account per day. (All 500+ Rock Band songs aren't going to download themselves, after all.)

I'm sure you can think of lots of other real world examples. They're all around you.

There are people who act like groups of rampaging teenage students online, too, and we deal with them in the same way: by imposing rate limits! Consider how Google limits any IP address that's submitting "too many" search requests:

Several things can trigger the sorry message.

google error: we're sorry, search rate limiter with captcha

Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on. This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.

I did a bit of Google scraping once for a small research project, but I never ran into the CAPTCHA limiter. I think that entry predates its appearance. But it does make you wonder what typical search volumes are, and how they're calculated. Determining how much is "too much" -- that's the art of rate limiting. It's a tricky thing, even for the store owner:

  • Couldn't three morally bankrupt students shoplift just as effectively as four?
  • How do you tell who is a student? Is it based purely on perception of age?
  • Do we expect this rule to be self-enforcing? Will the fourth student walk into the store, identify three other students, and then decide to leave?

Rate limiting isn't always a precise science. But it's necessary, even with the false positives -- consider how dangerous a login entry with no limits on failed attempts could be. This is especially true once your code is connected to the internet. Human students can be a problem, but there's a practical limit to how many students can fit in a store, and how fast they can physically shoplift your inventory. But what if those "students" were an infinite number of computer programs, capable of stealing items from your web store at a rate only limited by network bandwidth? Your store would be picked clean in a matter of minutes. Maybe even seconds!

Not having any sort of rate limiting in your web application is an open invitation to abuse. Even the most innocuous of user actions, if done rapidly enough and by enough users, could have potentially disastrous effects.

Even after you've instituted a rate limit, you can still get in trouble. On Stack Overflow, we designed for evil. We have a Google-style rate limiting CAPTCHA in place, along with a variety of other bot defeating techniques. They'be been working well so far. But what we failed to consider was that a determined (and apparently ultra-bored) human user could sit there and solve CAPTCHAs as fast as possible to spam the site.

And thus was born a new user based limit. I suppose we could create a little sign and hang it outside our virtual storefront:

Only 1 question per new user every 10 minutes, please.

There are a few classes of rate limiting or velocity checking you can do:

  1. Per user or API key. Ensure that any given user account or API account key holder can only perform (n) actions per minute. This is usally fairly safe, though it won't protect you from a user who automates the creation of 100 puppet accounts to do their bidding. It all depends how strictly you tie identity to the API key or user; you can easily ban, or in the worst case, track down the culprits and ask them to desist.

  2. Per IP address. Ensure that any given IP address can only perform (n) actions per minute. This works well in the typical case, but can cause problems for multiple users who happen to be behind a proxy that makes them appear to you as the "same" IP address. This is the only method possible on mostly anonymous sites like Craigslist, and it definitely works, because I've been on the receiving end of it. Example implementations are mod_evasive for Apache, or the IIS7 Dynamic IP Restriction module.

  3. Per global action. Ensure that a particular action can only happen (n) times per minute. Kind of the nuclear option, so obviously must be used with care. Can make sense for the "big red launch button" administrator functions which should be extraordinarily rare -- until a malicious user happens to gain administrator rights and starts pushing that big red button over and over.

I was shocked how little comprehensive information was out there on rate limiting and velocity checking for software developers, because they are your first and most important line of defense against a broad spectrum of possible attacks. It's amazing how many attacks you can mitigate or even defeat by instituting basic rate limiting.

Take a long, hard look your own website -- how would it deal with a roving band of bored, morally ambiguous schoolkids?

Posted by Jeff Atwood    88 Comments

February 19, 2009

The Bad Apple: Group Poison

A recent episode of This American Life interviewed Will Felps, a professor who conducted a sociological experiment demonstrating the surprisingly powerful effect of bad apples.

Groups of four college students were organized into teams and given a task to complete some basic management decisions in 45 minutes. To motivate the teams, they're told that whichever team performs best will be awarded $100 per person. What they don't know, however, is that in some of the groups, the fourth member of their team isn't a student. He's an actor hired to play a bad apple, one of these personality types:

  1. The Depressive Pessimist will complain that the task that they're doing isn't enjoyable, and make statements doubting the group's ability to succeed.
  2. The Jerk will say that other people's ideas are not adequate, but will offer no alternatives himself. He'll say "you guys need to listen to the expert: me."
  3. The Slacker will say "whatever", and "I really don't care."

The conventional wisdom in the research on this sort of thing is that none of this should have had much effect on the group at all. Groups are powerful. Group dynamics are powerful. And so groups dominate individuals, not the other way around. There's tons of research, going back decades, demonstrating that people conform to group values and norms.

But Will found the opposite.

Invariably, groups that had the bad apple would perform worse. And this despite the fact that were people in some groups that were very talented, very smart, very likeable. Felps found that the bad apple's behavior had a profound effect -- groups with bad apples performed 30 to 40 percent worse than other groups. On teams with the bad apple, people would argue and fight, they didn't share relevant information, they communicated less.

Even worse, other team members began to take on the bad apple's characteristics. When the bad apple was a jerk, other team members would begin acting like a jerk. When he was a slacker, they began to slack, too. And they wouldn't act this way just in response to the bad apple. They'd act this way to each other, in sort of a spillover effect.

What they found, in short, is that the worst team member is the best predictor of how any team performs. It doesn't seem to matter how great the best member is, or what the average member of the group is like. It all comes down to what your worst team member is like. The teams with the worst person performed the poorest.

The actual text of the study (pdf) is available if you're interested. However, I highly recommend listening to the first 11 minutes of the This American Life show. It's a fascinating, highly compelling recap of the study results. I've summarized, but I can't really do it justice without transcribing it all here.

Ira Glass, the host of This American Life, found Felps' results so striking that he began to question his own teamwork:

I've really been struck at how common bad apples are. Truthfully, I've been kind of haunted by my conversation with Will Felps. Hearing about his research, you realize just how easy it is to poison any group [...] each of us have had moments this week where we wonder if we, unwittingly, have become the bad apples in our group.

As always, self-awareness is the first step. If you can't tell who the bad apple is in your group, it might be you. Consider your own behavior on your own team -- are you slipping into any of these negative bad apple behavior patterns, even in a small way?

But there was a solitary glimmer of hope in the study, one particular group that bucked the trend:

There was one group that performed really well, despite the bad apple. There was just one guy, who was a particularly good leader. And what he would do is ask questions, he would engage all the team members, and diffuse conflicts. I found out later that he's actually the son of a diplomat. His father is a diplomat from some South American country. He had this amazing diplomatic ability to diffuse the conflict that normally would emerge when our actor, Nick, would display all this jerk behavior.

This apparently led Will to his next research project: can a group leader change the dynamics and performance of a group by going around and asking questions, soliciting everyone's opinions, and making sure everyone is heard?

While it's depressing to learn that a group can be so powerfully affected by the worst tendencies of a single member, it's heartening to know that a skilled leader, if you're lucky enough to have one, can intervene and potentially control the situation.

Still, the obvious solution is to address the problem at its source: get rid of the bad apple.

Even if it's you.

Posted by Jeff Atwood    143 Comments

February 18, 2009

Are You An Expert?

I think I have a problem with authority. Starting with my own.

It troubles me greatly to hear that people see me as an expert or an authority, and not a fellow amateur.

If I've learned anything in my career, it is that approaching software development as an expert, as someone who has already discovered everything there is to know about a given topic, is the one surest way to fail.

Experts are, if anything, more suspect than the amateurs, because they're less honest. You should question everything I write here, in the same way you question everything you've ever read online -- or anywhere else for that matter. Your own research and data should trump any claims you read from anyone, no matter how much of an authority or expert you, I, Google, or the general community at large may believe them to be.

Have you ever worked with software developers who thought of themselves as experts, with almost universally painful results? I certainly have. You might say I've developed an anti-expert bias. Apparently, so has Wikipedia; a section titled warnings to expert editors explains:

  1. Experts can identify themselves on their user page and list whatever credentials and experience they wish to publicly divulge. It is difficult to maintain a claim of expertise while being anonymous. In practice, there is no advantage (and considerable disadvantage) in divulging one's expertise in this way.
  1. Experts do not have any other privileges in resolving edit conflicts in their favor: in a content dispute between a (supposed) expert and a non-expert, it is not permissible for the expert to "pull rank" and declare victory. In short, "Because I say so" is never an acceptable justification for a claim in Wikipedia, regardless of expertise. Likewise, expert contributions are not protected from subsequent revisions from non-experts, nor is there any mechanism to do so. Ideally, if not always in practice, it is the quality of the edits that counts.
  1. There is a strong undercurrent of anti-expert bias in Wikipedia. Thus, if you become recognized as an expert you will be held to higher standards of conduct than non-experts.

Let's stop for a moment to savor the paradox of a free and open encyclopedia written by people who view the contributions of experts with healthy skepticism. How could that possibly work?

I'd argue that's the only way it could work -- when all contributions are viewed critically, regardless of source. This is a radical inversion of power. But a radical inversion of power is exactly what is required. There are only a handful of experts, but untold million amateurs. And the contributions of these amateurs is absolutely essential when you're trying to generate a website that contains a page for.. well, everything. The world is a fractal place, filled with infinite detail. Nobody knows this better than software developers. The programmers in the trenches, spending every day struggling with the details, are the people who often have the most local knowledge about narrow programming topics. There just aren't enough experts to go around.

So what does it mean to be an expert, then, when expertise is perceived as impractical at best, and a liability at worst? In a recent Google talk, James Bach presented the quintessential postmodern image of an expert performing -- Steve McQueen in The Towering Inferno:

[turns to fire commissioner] What do we got here, Kappy?
Fire started, 81st floor, storage room. It's bad. Smoke's so thick, we can't tell how far it's spread.
Exhaust system?
Should've reversed automatically. It must be a motor burnout.
Sprinklers?
They're not working on 81.
Why not?
I don't know.

steve-mcqueen-towering-inferno.jpg

[turns to architect] Jim? Give us a quick refresher on your standpipe system.
Floors have 3 and 1.5 inch outlets.
GPM?
Fifteen hundred from ground to 68, and 1,000 from 68 to 100, and 500 from there to the roof.
Are these elevators programmed for emergencies?
Yes.
What floor are your plans on?
79. My office.
That's two floors below the fire. It'll be our Forward Command. Men, take up the equipment. I want to see all floor plans, 81 through 85.
Gotcha.
[turns to security chief] Give me a list of your tenants.
Don't worry, we're moving them out now.
Not live-ins. Businesses.
We lucked out. Most of them haven't moved in yet. Those that have are off at night.
I want to know who they are, not where.
What's that got to do with anything? Who they are?
Any wool or silk manufacturers? In a fire, wool and silk give off cyanide gas. Any sporting good manufacturers, like table-tennis balls? They give off toxic gases. Now do you want me to keep going?
One tenant list, coming up.
[turns to crew leader] What do we got?
Elevator bank, central core. Service elevators here. Air conditioning ducts, 6 inches.
Pipe alleys here?
One, two, three, four, five.
Have you got any construction on 81? Anything that can blow up, like gasoline, fabric cleaner?
I don't think so.

What does this tell us? I mean, other than .. Steve McQueen is a badass? Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction.

What I love about James Bach's presentation is how he spends the entire first half of it questioning and deconstructing everything -- his field, his expertise, his own reputation and credentials, even! And then, only then, he cautiously, slowly builds it back up through a process of continual learning.

Level 0: I overcame obliviousness
I now realize there is something here to learn.

Level 1: I overcame intimidation
I feel I can learn this subject or skill. I know enough about it so that I am not intimidated by people who know more than me.

Level 2: I overcame incoherence
I no longer feel that I'm pretending or hand-waving. I feel reasonably competent to discuss or practice. What I say sounds like what I think I know.

Level 3: I overcame competence.
Now I feel productively self-critical, rather than complacently good enough. I want to take risks, invent, teach, and push myself. I want to be with other enthusiastic students.

Insight like this is why Mr. Bach is my favorite Buccaneer-Scholar. He leaves us with this bit of advice to New Experts:

  • Practice, practice, practice!
  • Don't confuse experience with expertise.
  • Don't trust folklore -- but learn it anyway.
  • Take nothing on faith. Own your methodology.
  • Drive your own education -- no one else will.
  • Reputation = Money. Build and protect your reputation.
  • Relentlessly gather resources, materials, and tools.
  • Establish your standards and ethics.
  • Avoid certifications that trivialize the craft.
  • Associate with demanding colleagues.
  • Write, speak, and always tell the truth as you see it.

Of course, Mr. Bach is talking about testing here, but I believe his advice applies equally well to developing expertise in programming, or anything else you might do in a professional capacity. It starts with questioning everything, most of all yourself.

So if you want to be an expert in practice rather than in name only, take a page from Steve McQueen's book. Don't be the guy telling everyone what to do. Be the guy asking all the questions.

Posted by Jeff Atwood    112 Comments

February 13, 2009

Real Ultimate Programming Power

A common response to The Ferengi Programmer:

From what I can see, the problem of "overly-rule-bound developers" is nowhere near the magnitude of the problem of "developers who don't really have a clue."

The majority of developers do not suffer from too much design patterns, or too much SOLID, or agile, or waterfall for that matter. They suffer from whipping out cowboy code in a pure chaos environment, using simplistic drag & drop, data driven, vb-like techniques.

Absolutely.

But here's the paradox: the types of programmers who would most benefit from these guidelines, rules, principles, and checklists are the least likely to read and follow them. Throwing a book of rules at a terrible programmer just creates a terrible programmer with a bruise on their head where the book bounced off. This is something I discussed previously in Mort, Elvis, Einstein, and You:

Thus, if you read the article, you are most assuredly in the twenty percent category. The other eighty percent are not actively thinking about the craft of software development. They would never find that piece, much less read it. They simply don't read programming blogs-- other than as the result of web searches to find quick-fix answers to a specific problem they're having. Nor have they read any of the books in my recommended reading list. The defining characteristic of the vast majority of these so-called "vocational" programmers is that they are unreachable. It doesn't matter what you, I or anyone else writes here -- they'll never see it.

In the absence of mentoring and apprenticeship, the dissemination of better programming practices is often conveniently packaged into processes and methodologies. How many of these do you know? How many have you practiced?

1969Structured programming
1975Jackson Structured Programming
1980Structured Systems Analysis and Design Methodology
1980Structured Analysis and Design Technique
1981Information Engineering
1990Object-oriented programming
1991Rapid Application Development
1990Virtual finite state machine
1995Dynamic Systems Development Method
1998Scrum
1999Extreme Programming
2002Enterprise Unified Process
2003Rational Unified Process
2004Constructionist Design Methodology
2005Agile Unified Process

And how do we expect the average developer to find out about these? In a word, marketing. (I could have substituted religion here without much change in meaning.) It's no coincidence that a lot of the proponents of these methodologies make their living consulting and teaching about them. And they have their work cut out for them, too, because most programmers are unreachable:

I was sitting in my office chatting with my coworker Jeremy Sheeley. Jeremy leads the dev team for Vault and Fortress. In the course of our discussion, I suddenly realized that none of our marketing efforts would reach Jeremy. He doesn't go to trade shows or conferences. He doesn't read magazines. He doesn't read blogs. He doesn't go to user group meetings.

Jeremy is a decision-maker for the version control tool used by his team, and nothing we are doing would make him aware of our product. How many more Jeremies are out there?

Millions! As Seth Godin notes, the unreachable are now truly unreachable -- at least not through marketing.

So, if we know the programmers who would benefit most from these rules and principles and guidelines are:

  1. highly unlikely to ever read them of their own volition
  2. almost impossible to reach through traditional religionmarketing

Remind me again -- who, exactly, are we writing these principles, rules, guidelines, and methodologies for? If we're only reaching the programmers who are thoughtful enough to care about their work in the first place, what have we truly accomplished? I agree with Jeff R., who left this comment:

There's nothing wrong with the SOLID principles; they make sense to me. But I've been programming since the days of card readers and teletypes. They won't make sense to those with little experience. They don't know when or how to apply them appropriately. They get bogged down in the attempt.

So trying to follow them changes the focus from result to process. And that's deadly.

It's the job of the lead programmer or manager to see that good principles are followed, perhaps by guiding others invisibly, without explicitly mandating or even mentioning those principles.

In my effort to suck less every year, I've read hundreds of programming books. I've researched every modern programming methodology. I'm even a Certified Scrum Mastertm. All of it, to me, seems like endlessly restated versions of four core fundamentals. But "four core fundamentals?" that's awful marketing. Nobody will listen in rapt, adoring attention to me as I pontificate, nor will they pay the exorbitant consulting fees I demand to support the lifestyle I have become accustomed to. It simply won't do. Not at all. So, I dub this:

The Atwood System of Real Ultimate Programming Power

  1. DRY
  2. KISS
  3. YAGNI
  4. NAMBLA

All those incredibly detailed rules, guidelines, methodologies, and principles? YAGNI. If it can't be explained on a single double-spaced sheet of paper, it's a waste of your time. Go read and write some code! And if you can't grok these fundamentals in the first three or four years of your programming career, well -- this slightly modified R. Lee Ermey quote comes to mind.

My name is Jeff, and I can't stop thinking about programming. And neither should you.

Posted by Jeff Atwood    180 Comments

February 11, 2009

The Ferengi Programmer

There was a little brouhaha recently about some comments Joel Spolsky made on our podcast:

Last week I was listening to a podcast on Hanselminutes, with Robert Martin talking about the SOLID principles. (That's a real easy-to-Google term!) It's object-oriented design, and they're calling it agile design, which it really, really isn't. It's principles for how to design your classes, and how they should work. And, when I was listening to them, they all sounded to me like extremely bureaucratic programming that came from the mind of somebody that has not written a lot of code, frankly.

There's nothing really objectionable about Bob's object-oriented design principles, on the face of it. (Note that all links in the below table are PDFs, so click accordingly.)

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Dependency Inversion Principle Depend on abstractions, not on concretions.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Release Reuse Equivalency Principle The granule of reuse is the granule of release.
The Common Closure Principle Classes that change together are packaged together.
The Common Reuse Principle Classes that are used together are packaged together.
The Acyclic Dependencies Principle The dependency graph of packages must have no cycles.
The Stable Dependencies Principle Depend in the direction of stability.
The Stable Abstractions Principle Abstractness increases with stability.

While I do believe every software development team should endeavor to follow the instructions on the paint can, there's a limit to what you can fit on a paint can. It's the most basic, most critical information you need to proceed and not make a giant mess of the process. As brief as the instructions on a paint can are, they do represent the upper limit of what most people will realistically read, comprehend, and derive immediate benefit from.

Expanding from a few guidelines on a paint can into a detailed painting manual is far riskier. The bigger and more grandiose the set of rules you come up with, the more severe the danger. A few broad guidelines on a paint can begets thirty rules for painting, which begets a hundred detailed principles of painting..

Pretty soon you'll find yourself believing that every possible situation in software development can be prescribed, if only you could come up with a sufficiently detailed set of rules! And, of course, a critical mass of programmers patient enough to read Volumes I - XV of said rules. You'll also want to set up a few messageboards for these programmers to argue endlessly amongst themselves about the meaning and interpretation of the rules.

This strikes me as a bit like Ferengi programming.

Ferengi Rules of Acquisition, book cover

The Ferengi are a part of the Star Trek universe, primarily in Deep Space Nine. They're a race of ultra-capitalists whose every business transaction is governed by the 285 Rules of Acquisition. There's a rule for every possible business situation -- and, inevitably, an interpretation of those rules that gives the Ferengi license to cheat, steal, and bend the truth to suit their needs.

At what point do you stop having a set of basic, reasonable programming guidelines -- and start being a Ferengi programmer, an imperfect manifestation of the ruleset?

Like James Bach, I've found less and less use for rules in my career. Not because I'm a self-made genius who plays by my own rules, mind you, but because I value the skills, experience, and judgment of my team far more than any static set of rules.

When Ron says there is an "absolute minimum of practice" that must be in for an agile project to succeed, I want to reply that I believe there is an absolute minimum of practice needed to have a competent opinion about things that are needed -- and that in his post he does not achieve that minimum. I think part of that minimum is to understand what words like "practice" and "agile" and "success" can mean (recognizing they are malleable ideas). Part of it is to recognize that people can and have behaved in agile ways without any concept of agile or ability to explain what they do.

My style of development and testing is highly agile. I am agile in that I am prepared to question and rethink anything. I change and develop my methods. I may learn from packaged ideas like Extreme Programming, but I never follow them. Following is for novices who are under active supervision. Instead, I craft methods on a project by project basis, and I encourage other people to do that, as well. I take responsibility for my choices. That's engineering for adults like us.

Guidelines, particularly in the absence of experts and mentors, are useful. But there's also a very real danger of hewing too slavishly to rulesets. Programmers are already quite systematic by disposition, so the idea that you can come up with a detailed enough set of rules, and sub-rules, and sub-sub-rules, that you can literally program yourself for success with a "system" of sufficient sophistication -- this, unfortunately, comes naturally to most software developers. If you're not careful, you might even slip and fall into a Methodology. Then you're in real trouble.

Don't become a Ferengi Programmer. Rules, guidelines, and principles are gems of distilled experience that should be studied and respected. But they're never a substute for thinking critically about your work.

Posted by Jeff Atwood    220 Comments

February 9, 2009

The Elephant in the Room: Google Monoculture

I was browsing the sessions at an upcoming Search Conference, which describes itself thusly:

The way to online success is through being easily found in search engines such as Google, Yahoo!, and Microsoft Live Search. While developers have historically thought of search as a marketing activity, technical architecture has now become critical for search success.

Anyone else see the elephant in the room, there? No?

Banksy: elephant in room

Just two weeks after we launched Stack Overflow, I mentioned that search engines already made up 50% of our traffic. Well, not so much search engines as search engine:

I try to be politically correct in discussing web search, avoiding the g-word whenever possible, desperately attempting to preserve the illusion that web search is actually a competitive market. But it's becoming a transparent and cruel joke at this point. When we say "web search" we mean one thing, and one thing only: Google. Rich Skrenta explains:

I'm not a professional analyst, and my approach here is pretty back-of-the-napkin. Still, it confirms what those of us in the search industry have known for a long time.

The New York Times, for instance, gets nearly 6 times as much traffic from Google as it does from Yahoo. Tripadvisor gets 8 times as much traffic from Google vs. Yahoo.

Even Yahoo's own sites are no different. While it receives a greater fraction of Yahoo search traffic than average, Yahoo's own flickr service gets 2.4 times as much traffic from Google as it does from Yahoo.

My favorite example: According to Hitwise, [ex] Yahoo blogger Jeremy Zawodny gets 92% of his inbound search traffic from Google, and only 2.7% from Yahoo.

That was written almost two years ago. Guess which way those numbers have gone since then?

Now that Stack Overflow has been chugging right along for almost six months, allow me to share the last month of our own data. Currently, 83% of our total traffic is from search engines, or rather, one particular search engine:

Search EngineVisits
Google3,417,919
Yahoo9,779
Live5,638
Search2,961
AOL1,274
Ask1,186
MSN1,177
Altavista202
Yandex191
Seznam103

Those 6x and 8x numbers that Rich quoted two years ago seem awfully quaint now. Google delivers 350x the traffic to Stack Overflow that the next best so-called "search engine" does. Three hundred and fifty times!

Now, I don't claim that Stack Overflow is representative of every site on the internet -- obviously it isn't. It's a site for programmers. And let me be absolutely crystal clear that I have no problem at all with Google. That said, I find it profoundly disturbing that if every other search engine in the world shut down tomorrow, our website's traffic would be effectively unchanged. That's downright scary.

Yes, I like Google. Yes, Google works great and has been my homepage for about eight years now. Google nailed search, and they deserve the leadership position they've earned. But where's the healthy competition? Where's the incentive for Google to improve? All I see is a large and growing monoculture that acts as the start page for the internet.

I'm a little surprised all the people who were so up in arms about the Microsoft "monopoly" ten years ago aren't out in the streets today lighting torches and sharpening their pitchforks to go after Google. Does the fact that Google's products are mostly free and ad-supported somehow exempt it from the same scrutiny? Isn't anyone else concerned that Google, even with the best of "don't be evil" intentions, has become more master than servant?

Calling the current state of search engine competition a horse race is an insult to horse races. No, what we have here is a one horse race where all the other horses were shipped off to glue factories years ago. Forget "search conference", you should be throwing a "Google conference", because there's no difference.

I don't know. Maybe that's OK. But it does mean that if Google, for whatever reason, decided to remove you from its search results, your website no longer exists. At least not as a viable business, anyway.

Posted by Jeff Atwood    201 Comments

February 7, 2009

Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels

The introduction to Head First Design Patterns exhorts us not to reinvent the wheel:

You're not alone. At any given moment, somewhere in the world someone struggles with the same software design problems you have. You know you don't want to reinvent the wheel (or worse, a flat tire), so you look to Design Patterns -- the lessons learned by those who've faced the same problems. With Design Patterns, you get to take advantage of the best practices and experience of others, so that you can spend your time on...something else. Something more challenging. Something more complex. Something more fun.

Avoiding the reinvention of the proverbial wheel is a standard bit of received wisdom in software development circles. There's certainly truth there, but I think it's a bit dangerous if taken too literally -- if you categorically deny all attempts to solve a problem with code once any existing library is in place.

square bike wheel

I'm not so sure. I think reinventing the wheel, if done properly, can be useful. For example, James Hart reinvented the wheel. And he liked it:

I reinvented the wheel last week. I sat down and deliberately coded something that I knew already existed, and had probably also been done by many many other people. In conventional programming terms, I wasted my time. But it was worthwhile, and what's more I would recommend almost any serious programmer do precisely the same thing.

But who's James Hart? Just another programmer. If that doesn't carry enough weight for you, how does it sound coming from Charles Moore, the creator of FORTH?

A second corollary was even more heretical: "Do it yourself!"

The conventional approach, enforced to a greater or lesser extent, is that you shall use a standard subroutine. I say that you should write your own subroutines.

Before you can write your own subroutines, you have to know how. This means, to be practical, that you have written it before; which makes it difficult to get started. But give it a try. After writing the same subroutine a dozen times on as many computers and languages, you'll be pretty good at it.

Moore followed this to an astounding extent. Throughout the 70's, as he implemented Forth on 18 different CPUs, he invariably wrote for each his own assembler, his own disk and terminal drivers, even his own multiply and divide subroutines (on machines that required them, as many did). When there were manufacturer-supplied routines for these functions, he read them for ideas, but never used them verbatim. By knowing exactly how Forth would use these resources, by omitting hooks and generalities, and by sheer skill and experience (he speculated that most multiply/divide subroutines were written by someone who had never done one before and never would again), his versions were invariably smaller and faster, usually significantly so.

Moreover, he was never satisfied with his own solutions to problems. Revisiting a computer or an application after a few years, he often re-wrote key code routines. He never re-used his own code without re-examining it for possible improvements. This later became a source of frustration to Rather, who, as the marketing arm of FORTH, Inc., often bid jobs on the assumption that since Moore had just done a similar project this one would be easy -- only to watch helplessly as he tore up all his past code and started over.

And then there's Bob Lee, who leads the core library development on Android.

Depending on the context, you can almost always replace "Why reinvent the wheel?" with "Please don't compete with me," or "Please don't make me learn something new." Either way, the opponent doesn't have a real argument against building something newer and better, but they also don't want to admit their unhealthy motivations for trying to stop you.

More seeds, more blooms, I say. Don't build houses on kitchen sinks. Reinvent away. Most of our current technology sucks, and even if it didn't, who am I to try and stop you?

Indeed. If anything, "Don't Reinvent The Wheel" should be used as a call to arms for deeply educating yourself about all the existing solutions -- not as a bludgeoning tool to undermine those who legitimately want to build something better or improve on what's already out there. In my experience, sadly, it's much more the latter than the former.

So, no, you shouldn't reinvent the wheel. Unless you plan on learning more about wheels, that is.

Posted by Jeff Atwood    104 Comments
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.