I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

January 10, 2006

Flattening Arrow Code

I often encounter code like this:

if (rowCount > rowIdx)
{
    if (drc[rowIdx].Table.Columns.Contains("avalId"))
    {
        do
        {
            if (Attributes[attrVal.AttributeClassId] == null)
            {
                // do stuff
            }
            else
            {
                if (!(Attributes[attrVal.AttributeClassId] is ArrayList))
                {
                    // do stuff
                }
                else
                {
                    if (!isChecking)
                    {
                        // do stuff
                    }
                    else
                    {
                        // do stuff
                    }
                }
            }
            rowIdx++;
        }
        while(rowIdx < rowCount && GetIdAsInt32(drc[rowIdx]) == Id);
    }
    else
    rowIdx++;
}
return rowIdx;

The excessive nesting of conditional clauses pushes the code out into an arrow formation:

 if
   if
     if
       if
         do something
       endif
     endif
   endif
 endif
   arrowhead.png

And you know you're definitely in trouble when the code you're reading is regularly exceeding the right margin on a typical 1280x1024 display. This is the Arrow Anti-Pattern in action.

One of my primary refactoring tasks is "flattening" arrow code like this. Those sharp, pointy barbs are dangerous! Arrow code has a high cyclomatic complexity value-- a measure of how many distinct paths there are through code:

Studies show a correlation between a program's cyclomatic complexity and its error frequency. A low cyclomatic complexity contributes to a program's understandability and indicates it is amenable to modification at lower risk than a more complex program. A module's cyclomatic complexity is also a strong indicator of its testability.

Where appropriate, I flatten that arrow code by doing the following:

  1. Replace conditions with guard clauses. This code..

    if (SomeNecessaryCondition) {
        // function body code
    }
    

    .. works better as a guard clause:

    if (!SomeNecessaryCondition)
    {
        throw new RequiredConditionMissingException;
    }
    // function body code
    

  2. Decompose conditional blocks into seperate functions. In the above example, we're in a do..while loop which could be decomposed:

    do
    {
        ValidateRowAttribute(drc[rowIdx]);
        rowIdx++;
    }
    while(rowIdx < rowCount && GetIdAsInt32(drc[rowIdx]) == Id);
    

  3. Convert negative checks into positive checks. As a broad rule, I prefer to put the positive comparison first and let the negative comparison fall out naturally into the else clause. I think this reads a lot better and, more importantly, avoids the "I ain't not doing that" syndrome:

    if (Attributes[attrVal.AttributeClassId] is ArrayList)
    {
        // do stuff
    }
    else
    {
        // do stuff
    }
    

  4. Always opportunistically return as soon as possible from the function. Once your work is done, get the heck out of there! This isn't always possible -- you might have resources you need to clean up. But whatever you do, you have to abandon the ill-conceived idea that there should only be one exit point at the bottom of the function.

The goal is to have code that scrolls vertically a lot.. but not so much horizontally.

Posted by Jeff Atwood    View blog reactions

 

« Cleaning Word's Nasty HTML Return to the Planet of Managed Code Bloat »

 

Comments

also many times it is possible to combine the "ifs" into a single if:

if (something) {
if (somethingelse) {
if (somemore) {...

into
if (something || somethingelse || somemore) {
...

and some times is even better to create a function to validate those, so it would be
if (DataIsValid(something, somethingelse, somemore)) {...

Eber Irigoyen on January 10, 2006 06:21 PM

I have to tell you, number 4 chafed a little bit. I have always avoided multiple returns from a function like the plague.

That being said, I jumped into McConnell (Code Complete 2) right after reading your post today and found what he had to say: "Use a return when it enhances readability: In certain routines, once you know the answer, you want to return it to the calling routine immediately. If the routine is defined in such a way that it doesn't require any further cleanup once it detects an error, not returning immediately means that you have to write more code."

McConnel's example:
Comparison Compare(int value1, int value2)
{
if ( value1 value2 )
return Comparison_GreaterThan;
else
return Comparison_Equal;
}

but he also says this:
"Minimize the number of returns in each routine: It's harder to understand a routine when, reading it at the bottom, you're unaware of the possibility that it returned somewhere abouve. For that reason, use returns judiciously--only when they improve readability. "

Now, I have to say, I'm a fairly green developer, but this somehow went against a hard coded rule I had. One function --> one return.

Long story short (too late), I learned something today. My little rule was wrong and needed to be rewritten. I hadn't got that far in the book yet, so hearing it from you AND the gospel (McConnell) will have me trying to change my ways. Thanks!

kludger on January 10, 2006 07:07 PM

So THAT'S what the preview button is for...

Should have been this:

McConnell's example:
Comparison Compare(int value1, int value2)
{
if ( value1 LT value2 )
return Comparison_LessThan;
else if ( value1 GT value2 )
return Comparison_GreaterThan;
else
return Comparison_Equal;
}

kludger on January 10, 2006 07:11 PM

I love to see things like this posted on the web. When I was back in school, it was always a major deal if you didn't follow the strict CS rule of only having one return point for your function.

I use to argue that it didn't do you any good if it made your code harder to read, but I would always get into trouble anyway.

It's interesting to see the difference between coding theory and coding practice.

Ryan Smith on January 10, 2006 07:15 PM

> also many times it is possible to combine the "ifs" into a single if:

Yes, although that is a bit of a false economy; the code is "flatter" visually but the if blocks are still present; they were just consolidated into one line. As you pointed out, the best solution is to consolidate the logic into an external function with a proper name, thus reducing the overall per-function cyclomatic complexity.

> I have to tell you, number 4 chafed a little bit. I have always avoided multiple returns from a function like the plague.

I'm glad to hear this got you thinking about it!

The two Code Complete guidelines you cite aren't mutually exclusive..

1) favor things that increase readability
2) favor things that reduce the amount of code you have to write

That said, a perfectly flat function with 53 returns is just as pathological as a 3,000 character wide arrow function with a single return. The difference between the two is where our "mad development skillz" come in.

Jeff Atwood on January 10, 2006 07:19 PM

I think the goal is more readable code (which just happens to be served by a vertical layout more than a horizontal layout). I prefer a function in which I don't have to scroll at all.

I like your point about using guard clauses. That sort of fits into the Fowler philosophy of failing fast (http://www.martinfowler.com/ieeeSoftware/failFast.pdf).

Get all the failure conditions out of the way at the top of the method so the body can focus on the real meat.

As for 1 return point, I think that's a bit outdated with code that now has structured exception handling. If you think about it, you have several "return points".

if(!SomethingNeeded)
throw new CustomException("message");

That there is a potential return point.

Though I do agree that keeping it to a minimum is important, there are certain places where a return or a break can be expected that make the overall code more readable.

Haacked on January 10, 2006 08:05 PM

Kludger, your rule is actually only advice on how to achieve a goal. The goal is "understandable code".

Using one return value is one way to get understandable code. Having methods short enough to fit on a screen is another. Do whatever you need to do to make the code more understandable, and if that means you do early exit from a function, big deal.

Robert Watkins on January 10, 2006 08:40 PM

Hi,

first the thing that has to be said since - like ages ago: Your blog rocks. In the year I'm now subscribed to your RSS feed I never even once encountered a posting that did not at least remotely interest me. Well written, nice topics. Grats!

Second: I so agree with you. Code with deeply stacked IF's is terribly hard to read and understand.

Some experience with exactly this problem can be found in a new blog entry of mine inspired by you:

http://www.gnegg.ch/archives/flattening_arrow_code.html

Philip

Philip on January 11, 2006 03:12 AM

As a practical application that combines the guard clause with the "return soonest" philosophy, one ASP.NET pattern I find myself using it a lot is in code like this:

Sub Button1_Click(sender As Object, e As EventArgs)
    If Not Page.IsValid Then
      Return
   End If
   ' Actual handler code
End Sub

... or the like.

mike on January 11, 2006 03:24 AM

Hi Jeff,

Correct my if I'm wrong but your suggested refactorings, whilst making the code read better, won't actually reduce cyclomatic complexity?

Josh Twist on January 11, 2006 03:52 AM

When refactoring one must be careful to preserve semantics. Nested if clauses are conjunctions, not disjunctions:

if (something) {
if (somethingelse) {
if (somemore) {...

actually combines into
if (something && somethingelse && somemore) {

All the clauses must be true for the inner block to execute.

Introducing exceptions also change the contract of the function. There were no exceptions before. Now you need to write catch handlers in all calling code. You need to consider the cost of throwing an exception is roughly 10000x the cost of executing an if statement.

A better refactoring would be to extract the inner if clauses into a separate function, thus breaking off the tip of the arrow:

if (drc[rowIdx].Table.Columns.Contains("avalId"))
{
do
{
BrokenOffBit( Attributes, attrVal, isChecking, rowIdx );
rowIdx++;
}
while(rowIdx < rowCount && GetIdAsInt32(drc[rowIdx]) == Id);
}
else
rowIdx++;

Christian Mogensen on January 11, 2006 05:39 AM

I originally stormed into the coding world throwing guard clauses and multiple function returns out left and right to cut down on those darn curly braces.

Then I would be forced into debugging a problem. And suddenly those exit points made the ol standby of println() retVal pretty tough to pull off.

So my "rule" looks a bit like this: If the function fits on a less than a dozen lines (and they do, 95% of the time) then multiple returns are A-OK. Once you get into some of the ganglier methods, you can have one or two guard clauses, but everything after than has to have a single return value so you can debug easier.

While I'm rambling about guard clauses, has anyone used those plus some coding guidelines about logging every public method call? Where do you put the log statement? If you enter the method and nothing happens, then it's a bit misleading to write it to the log on the first line of the method. But if you try to figure out where the log statement goes, you spend extra time figuring out where to actually log that something happened. I guess I could write more statements, but at some point you end up logging every line of code.

Anyway, great entry!

Steve Jackson on January 11, 2006 08:52 AM

Call me archaic, but combining if's can be bad in many cases.

1) It often makes the code less readable. If the multiple items in the if are not logically grouped, then following that logic can be fuzzy sometimes.

2) Not all languages support short-circutting. VB Classic and VBA are prime examples. Combining IF's can lead to bugs and slower code in those cases.

Eric D. Burdo on January 11, 2006 09:07 AM

Regarding the multiple exit points, I agree completely.

I inherited a partially written application from someone. In looking through the code, I found a function that went something like this (Delphi code):

strRet := 'Fred';
if (somecondition) then
strRet := 'Something';
if strRet = 'Fred' then
if (someothercondition) then
strRet := 'Something else';
if strRet = 'Fred' then
// and so forth for 30 additional conditionals

Result := strRet;

I asked the original developer why he'd done it this way. The response? "There should be only one exit point."

Needless to say, all of the tests against 'Fred' soon disappeared, and there certainly are more exit points.

Ken White on January 11, 2006 09:59 AM

CodeRush/Refactor make the process of working with these kinds of blocks easier, and refactoring them quickly. Structural Highlighting (CodeRush) draws lines between the start and end of the block, which makes them more clear. With Refactor, you can use the "Replace Nested Conditional with Guard Clause" refactoring to easily flatten your code for better readability, or the "Extract Method" refactoring to move part of the block to its own method. If you're interested, check it out at http://www.devexpress.com/products/NET/Coderush/. Definitely check out the "CodeRush Training" link from that page, and download the trial to see it for yourself.
Disclaimer: I'm just a happy CodeRush/Refactor user, and have no commercial interest in either.

Eric Hoch on January 11, 2006 10:56 AM

> your suggested refactorings, whilst making the code read better, won't actually reduce cyclomatic complexity

#1, #2, and #4 definitely will reduce cyclomatic complexity-- less IFs are nested, therefore there are fewer total paths through the code. The total number of IF statements might be the same, but they are no longer nested or they've been moved to another function (and no longer nested, at least not in the original function).

The suggestion to combine multiple IF statements on the same line, ala

if (something || somethingelse || somemore) {

Won't reduce CC at all. But that's not on my list..

> has anyone used those plus some coding guidelines about logging every public method call?

Why not just dump the stack trace at any given time rather than doing explicit logging? The stack trace will give you the current method for sure..

Jeff Atwood on January 11, 2006 12:45 PM

> You need to consider the cost of throwing an exception is roughly 10000x the cost of executing an if statement.

Sigh... The good old premature optimization. As well, in some languages, throwing exceptions is actually not expensive at all.

foobar on January 11, 2006 01:31 PM

foobar -
Making a good policy on how to use exceptions is not a premature optimization. If you are exposing functionality that can only be reached by catching an exception then (on most platforms) you have a bad design. I'm not claiming devs shouldn't use exceptions, but they shouldn't fall into the same Parse vs TryParse trap the .Net runtime fell into before clr 2.0.

Steve Steiner on January 11, 2006 02:38 PM

Are exceptions control structures? My answer is "no." Maybe I'm just an old fuddy-duddy but throwing exceptions just to guard the execution of code hides the responsibility evaluate and react to conditions but doesn't make it any easier to satisfy.

Terrier on January 11, 2006 03:18 PM

> If you are exposing functionality that can only be reached by catching an exception then (on most platforms) you have a bad design.

This is a wholly different issue than not using exceptions because they're "10000x" slower than an if statement. I suppose that means that .NET is "10000x" slower than some other language - look at all the types that throw exceptions! They should have given me an HRESULT instead. What were they thinking?

foobar on January 11, 2006 07:43 PM

There was a lot of good information in this post, however I too, like some of the other posters, like to only return from a function in one spot.

With a well written nested if statement, it is extremely easy to deallocate memory and/or do other cleanup at the correct times. With "return" statements sprinkled through a function, it gets hard to ensure you don't leak memory. Especially if you edit a previously written function, you will have to find all the return statements and figure out whether memory should be freed there.

In languages where memory mangement is done for you (i.e. "It drops out of scope and the system frees it for me") this is somewhat less of a problem, but then tends to make programmers lazy and forget times where the DO have to clean up after themselves.

G-Man on January 11, 2006 09:03 PM

> Why not just dump the stack trace at any given time rather than doing explicit logging? The stack trace will give you the current method for sure..

Because that requires debugging symbols to be deployed with your assemblies, which isn't necesarily a bad thing IMO, but it's a constraint nonetheless. I posted my opinions about deploying pdb's here: http://jaysonknight.com/blog/archive/2005/11/09/2900.aspx

jayson knight on January 12, 2006 04:18 AM

In general, if the functions are small and compact, you can avoid the type of nesting shown.

I think, if you can, try for one return point in a function. This will enable you to put 2 debug statements per function, one at the start marking the arguments passed in and one at the end marking the return value back to the caller.

If you passing objects as parameters, this becomes more difficult, but if the functions are small and compact and doing a small amount of work, this works out quite well.

Additionally, I think debug statements that are used in this fashion gives a developer a good roadmap of what the application is doing without being to0 obtrusive or flooding the code with many debug statements. If there are many debug statements inside of function, perhaps the function needs to be refactored into smaller parts to follow this coding methodology.

I usually allow the deployment team to turn on or off the debug statement with a configuration setting. That way, if something goes wrong, we can turn on debugging to see what is going on without having symbols or a debug build or stack traces. If things really become hairy, then we get out the big guns and get dump files, etc.

Another benefit is that I can turn on debugging locally and look at the trace file and verify all the unit tests ran properly by verifying the output from the debug listener if I'm adding in new code (methods, functions, etc.)

Jon Raynor on January 12, 2006 01:29 PM

Hey Jeff,

Love this post and the discussions it has brought up. What are your thoughts, in an IF statement, on testing the scenario that is likely to happen most of the time? Does that matter for complexity sake or for readability? I mean this way you can avoid going through the rest of the code most of the time but what if it lessens the readability or checking for a negative versus a positive. Below is a very generic scenario but I think you know what I mean.

Thanks

if(!newsletter.subscribe)
{
// code to unsubscribe
// occurs 90% of the time
}
else
{
// code to subscribe
// occurs 10% of the time
}

return;

Erik Lane on January 17, 2006 10:56 AM

> What are your thoughts, in an IF statement, on testing the scenario that is likely to happen most of the time? Does that matter for complexity sake or for readability?

Can you implement this check as a guard clause rather than an if..else statement? If so, do that.

I feel very strongly that non-negative if clauses are easier to read, but it's just a guideline. If there's some extremely compelling reason to do it the other way, then use your discretion.

Jeff Atwood on January 17, 2006 12:48 PM

Fantastic entry. Seriously.

Scott Hanselman on February 4, 2006 09:19 PM

Do not attribute to malice what can easily be explianed through incompetence. -Napoleon Bonaparte

brian on January 10, 2007 12:26 PM

Jeff, I happen to agree about positive if clauses - out of curiosity, do you ever find that sometimes it is even beneficial to do something like this?

if (condition) {
// Do nothing
} else {
... real code here ...
}

justin on October 26, 2007 09:51 AM

Assertions is a lot better solution than refactoring the same code through the same function. Assertions, with tests cases, can guarantee you that your function parameters are good. You then don't have to code all those nasty 'if' and 'else' to test the validity of your parameters. As your parameters have always a good value, you can code functionality straight.

Jérôme Radix on October 26, 2007 10:48 PM

Bigger Screens!!!

Someday we'll all have huge screens where the entire source code for any major project will all fit on one screen. Have some imagination!

How many acres of screens for the source code to Windows Vista? What new hardware technology do we need to see the whole thing just by moving our eyes and head?

JF

JFred on October 27, 2007 12:26 PM

Re: Eber Irigoyen

>> also many times it is possible to combine the "ifs" into a single if:
>>
>> if (something) {
>> if (somethingelse) {
>> if (somemore) {...
>>
>> into
>> if (something || somethingelse || somemore) {
>> ...

Good point, but you should put logical AND (&&) instead of OR (||).

sergio on October 29, 2007 05:00 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.