February 13, 2007
Unless you've been living under a rock for the last few years, you've probably heard about the game Katamari Damacy. The gameplay consists of little more than rolling stuff up into an ever-increasing ball of stuff. That's literally all you do. You start by rolling up small things like matchsticks, thimbles, pushpins, and so on. As the ball gets larger, you roll up ever larger and larger items. Eventually, your Katamari ball gets so large you end up rolling together cities, mountains, clouds-- eventually entire planets. It's unbelievably fun, and completely mesmerizing.
After I played for a while, I realized that Katamari Damacy is a game about the scale of life, reminiscent of the classic Eames powers of ten movie.
As Bob Koss points out, code has a natural tendency to become a giant Katamari ball of "stuff", too:
I travel a lot and I get to visit a lot of different companies. No matter which industry a company is in or which programming language a team is using, there is one commonality in all of the code that I see Ã¢â‚¬â€œ classes are just too damn big and methods are just too damn long.
We programmers must take matters into our own hands and become masters of our domains. Unless we take action, things are just going to get bigger and bigger until we have a real mess on our hands.
Bob's article is about managing the scale of your code:
This notion of breaking a class into smaller and smaller pieces is exactly opposite to what I learned when I first started studying OO. Way back when I worried about bad-hair days, people believed that a class should encapsulate everything that concerned it. A Customer class would know the business rules of being a Customer as well as how to retrieve itself from the database and display it's data. That's a fine idea, provided the database schema never changes, the display never changes, or the business rules never change. If any one of those responsibilities change, we are at a high risk of breaking other things that are coupled to it.
So many aspects of software development can be summarized as small is beautiful:
- The odds of failure for a software project are directly proportional to the size of the project. Slicing a large project into several smaller subprojects is the single most direct way to increase your project's chances of success.
- The relationship between lines of code and bugs is completely linear. Fewer code means fewer bugs.
- Smaller code avoids TL; DR (Too Long; Didn't Read) syndrome. The less code there is to read, the higher the odds are that someone will actually read it.
- If you keep your dependencies to a minimum, your code will be simpler and easier to understand.
It's up to us to resist the natural tendency of any project to snowball into a giant rolling Katamari ball of code. Code smaller!
Posted by Jeff Atwood
I *highly* recommend Katamari Damacy -- even if you just rent it for a weekend. One of the strangest/funnest games I've ever played.
And surprisingly on topic because it's feb14, a Katamari valentine card from Beavotron: http://frank05.critter.net/valentine_katamari.png
Yeah, a bunch of my friends love that game.
But I think the goal is to keep code manageable. Really good code generation is key for things like database access and objects I believe.
Rules of unnecessary optimization:
2. If you're tempted to violate rule 1, at least wait until the program is finished.
3. Nontrivial programs are never finished.
That being said, keeping your code in a state where it is manageable is not unnecessary optimization.
Hmm. Smaller is better? I think what we're really looking for is "modularized properly" is better. I think a lot of projects are dived into too willy-nilly (especially by neophyte/novice/hobbyist programmers) with no pre-planning of what the thing should do and how to logically code it. You should, if you're any good at you code, be able to sort tasks into obviously autonomous units, knowing how they'll interplay and how they'll solo. I find a lot of the code I read from others can only be described as a hunt 'n patch job. Inelegant, unreadable and difficult to support...
"modularized properly" is better
And what is modularization other than dividing something up into smaller, independent subunits?
Unfortunately, this is contrary to what is being taught in some high schools. I know that in at least one AP Computer Science textbook, they teach that everything should be in one class, with little concern for later changes and optimization. The root of this problem starts at a young age; proper techniques should be shown in schools.
Jeff, you asked what modularization is, but I think you're, intentionally or not, hinting at the real problem. Many programmers perform a modular breakdown of the top problem and stop there, writing each module as a monolithic "chunk". It's hard to learn to perform modularization on the modules, and to do that recursively until the modules are small enough to verify by inspection.
(Not that you should skip unit tests, but it's nice to be sure the tests will pass from reading the code, too.)
Tom Grochowicz, that's because in school, most every assignment is a standalone project. No need to break it up. I see this alot in the programming assignmnets we're given: rarely do they overlap. I do try to modularize as much as reasonable in the semester projects we get.
(By the way, I'm in college).
How does one judge when a method is too long or too complex? Cyclomatic Complexity is a metric that can be used to help determine when a method should be refactored into smaller units.
This isn't really a new observation, though the association with Katamari Damacy is clever -- LOVE THAT GAME.
For an older reference to the same issue, see this paper from 1997:
This paper examines this most frequently deployed
of software architectures: the BIG BALL OF MUD.
This is the strength of functional languages in general, and the more obscure Haskell in particular. Most of these languages encourage it, but others make it almost mandatory. Tragically, most functional languages are developed in academia with little regard for the pragmatic things that the language must do to really meet the needs of a wider audience: database access, GUI (and not ugly Tcl/Tk ones, either), and so on.
But, I'm going to argue, just for the sake of argument, that having TOO MANY methods leads to serious overhead in calls and stack usage. Don't argue with me about the speed of CPUs and how much memory there is available these days, because it ain't always so. In some cases our code runs on years-old customer Solaris boxes which are held to a fiduciarily-responsible level of cost. Call overhead for 100,000 sets of end-user transactions runs to more than you think, and sooner or later the customer starts complaining about the time taken per transaction. I can't very well go back to them and say "Yes, the transactions do take a long time, but the code itself is VERY maintainable!"
There's also a problem of fractal geometry here, in which each grape in the cluster is composed of more clusters of grapes, and let's face it, 10 lines of method calls which themselves devolve into 10 method calls, isn't much of an improvement, because you start to lose the context in which all the calls are made. (Yes, I know about minimizing dependencies, but believe me, it isn't always as neat and clean as just saying it, especially when you're dealing with complex host systems.)
I'm baffled that so many experienced software designers are ready to stand up and defend "coding bigger". The performance benefit of fewer method calls is a thing of the past. Seriously. What's more expensive -- more hardware, or higher development costs and opportunity cost? And if you're not ready to concede that, then you should be asking yourself what you're doing programming in a 3GL. You know you can't get good performance without diving down to assembly language.
The fact that you can't grasp the big picture when looking at the low level is *exactly the point*. That's what makes that low-level code maintainable and reusable. You don't *need* to know the big picture to be productive.
I second the earlier post on Big Ball of Mud. That paper is awesome, and is a good explanation of coding bigger.
I really love this blog :) thank you so much for sharing this great (small) view on software development. It's very to the point and has a low tldr factor :) heh you made my day again...
Katamari Damacy, huh? Wake me up when it comes out on the PC
Amen Jeff - lots of clearly purposed little pieces always gives a better design, but somehow this important mesage is not widely known.
This was brought forcefully home to me a few months ago, when I suggested to a more junior colleague he refactor some repeated code - only a few lines but forming a clearly identified atomic unit - into a method (or a class, I can't recall which now) - his reaction was astonishment - "surely it's not worth it".
There are a number of factors at play here I think:
1. Lack of education, pure and simple.
2. Premature optimisation (crazy but true).
3. Languages that make it a rigmarole to define ADTs - eg in Java we typically need a new file, cannot easily wrap primitives etc. Paradoxically we got much better type abstractions in C via the simple typedef!
4. Local coding standards that require (for well-intentioned reasons) banner headers for every type and method introduced.
5. Design documents that list the big classes and methods, but do not make it clear to developers that they can - and should - decompose further during implementation.
All of these conspire to make it a "big deal" in the mind of the programmer to break out of the flow and create a separate abstraction (method or type).
Jeff, you missed one aspect of keeping dependencies low:
Simpler, more independent code is easier to share.
Writing shareable, reusable code, IMO, separates the men from the boys.
The relationship between lines of code and bugs is only linear? That's surprising; I would have thought that as a project gets larger it picks up more confusing interactions, and the frequency of bugs would go up. Higher frequency x more lines would yield a nonlinear relationship.
It never ceases to amaze me when I go back and work on my old code (either to bug-fix or add features) that I often end up with LESS code that does more faster, more securely, and handles errors better. It is pretty hard to always get this right the first time without falling into the premature optimization trap.
"The world is not linear" -- Donald Hughes McElhone/1976
No, you won't likely find him on Google. I just worked for him. Bug count and line count are closer to geometric.
That said, since I've been living in Big Iron land for the last few years, a significant body of "new" code in Enterprise land (java, anyway) is replicating its COBOL grandpappies. In this land, programs are as large as the z/OS footprint will allow, since runtime cost is all that matters to these folk. They're wrong, of course, since these codebases change as often as any other. But they focus on Glass House charge back $$$, solely.
And OO hasn't been the Nirvana of plug-in objects either. Has it?
with a little modification, Steve Jackson's comment reflects my own thoughts on code quite nicely:
"I love small classes and methods, but I [am always] looking [for a] big picture, where do I start. Where's main? Does anyone have any suggestions on how to handle both sides of the picture? Make the code small, self describing, and maintainable (like legos), but still be able to step back and see the tie fighter? I feel that much of my code is very flexible and can be used in ways that were never intended, but what good is that if I'm the only one that can understand what the jigsaw puzzle looks like [when I'm done with my code]? ...
Steve Jackson on February 15, 2007 09:08 AM" -- emphasis, replacements and ellipsis are Dominick's.
When I first started developing I was introduced to the self-encapsulating class that knew way too much about itself and the world around it. Since I work primarily on web applications which is a mostly a state-free environment, it didn't make a lot of sense to continue to build those kinds of classes.
I eventually caught on to CodeSmith and began customizing templates to generate a entity/collection/manager class set for each database object. This method has saved a lot of time and make my code more agile. If you are interested, you can download the templates I use from my site.
If it's too big to readily understand, it's too big.
I grew up on COBOL, then Pascal and some other procedural languages. Functional Decomposition into manageable chunks with internally consistent levels of abstraction was my favorite way of breaking large rocks into small rocks. FD still shows up sometimes when breaking a longish method into more self-descriptive parts. But OO is really a whole other way of breaking big problems into little ones. Dependency management, always the main point of modular design even if we didn't know it, comes to the forefront for good OO fun. When methods are too big, it often means the class is too big. The more a class knows and does the more dependencies it gains on the things around it. The big ball, indeed.
Who said "modularizing code is against optimization" ? In C, if you really feel like you need THAT type of optimization, you can just use inline functions, while keeping the code readable.
First of all, the whole point about too many methods, functions or classes is right out.
A compiler absolutely should optimize out all those calls to just leave the meat of it. If yours doesn't, try Java. Also, ALWAYS go with the first expert rule of optimization, don't optimize for speed until you've written the best most understandable non-optimized code you can--and that means the smallest.
It really blows me away that people can argue with the most common, basic facts of software development. It's like an architect arguing that straw makes a better building material for high-rises.
We solved the problems long ago, if you don't like the rules--go study, the rules aren't wrong.
-Small methods, small functions, small objects and small files are more understandable and usable (good).
-Documented code is good.
-Any repeated logic anywhere is bad.
-Too many interactions between entities is bad.
-Variables that can be modified outside your currently displayed file (globals/publics) are bad (Actually just the worst case of the previous point).
-There are many others.
There are many fine points in our line of work that you can argue over, but if you find yourself arguing against one of these, please reconsider. If after consideration you still believe you are correct, please find another line of work--for all our sakes.
Remember also, The primary user of your code is the next programmer, not a computer--otherwise you'd just throw away the source and keep the object code.
To the people who had trouble because his small classes weren't understood:
Small classes are no harder to understand than large classes. I hate to say this, but consider that you should work on your communication ability (I always feel it's my fault if I can't download a concept from my head into someone else's)
Even before you code, you should be able to visualize your main classes or functional units and the relationships between them. If you have more than 5 small "core" classes, draw them on the whiteboard before you start coding. Eventually this should make it onto some form of paper. I've photographed whiteboards and created in-depth UML diagrams, but the whole point is to come up with whatever is necessary to communicate your structures to the others on your team.
What your users want to know is what are these core classes and how do they relate; this is exactly what class diagrams were created for.
Coding the small classes is easier for you to understand, so if it's not easier for the rest of your team to understand, bridge the gap and do whatever it takes to communicate.
This communication ability is what separates a senior programmer from a Junior.
At risk of being labeled just another Goldilocks: both small for the sake of small and big for the sake of big suffer from the same coinage, both sides.
small: not enough context as you scan, so your memory has to track it
big: too much context as you scan, so your brain gets overwhelmed
just right: well, yes
It's been a while since I've invoked him; just go read Allen Holub, you'll get an OO perspective that gets beyond get/set. You may not like what he says, but an honest rebuttal is not a trivial thing.
Jeff, thanks for this post, but I find it extremely sad that this even needs to be stated.
How does one judge when a method is too long or too complex? Cyclomatic Complexity is a metric that can be used to help determine when a method should be refactored into smaller units.
Tim Binkley-Jones on February 14, 2007 08:47 PM
Cyclomatic complexity is a useful measure but here are a few guidelines that I use...
1) A function should express a single step of operation at its level of abstration. Basically it should do one thing.
This is hard to understand sometimes but is "I know it when I see it" sort of zen state. If the function is doing several things, each of those things should be be refactored into individual functions.
2) A function should fit one one screen.
There are exceptions, like very long switch statements or long blocks of initialization of arrays and the like, but logically these of just long homogenous blocks.
3) Redundant code should be a function.
Cut-and-Paste is the duct tape of coding. Handy but don't use it to build sky scrapers. Redundant code is a bug waiting to happen.
4) If you can't understand it when you look at it next week, it is unclear and needs to be refactored.
I've been living under a rock. Never heard of it.
Ahh...no wonder. PS2 game. I don't pay attention to the PS2 scene.
Oh good, versions coming out for the Wii and XBox360. I have neither of those (do have an XBox) either. I may get a Wii, though.
Perhaps I'll check it out then.
I apparently live under a rock. I'd heard of Katamari but never knew what it was.
Here's a video: a href="http://www.youtube.com/watch?v=cwhFH75OCDs"http://www.youtube.com/watch?v=cwhFH75OCDs/a
I love small classes and methods, but I work with a lot of people who have a hard time grasping things that are very small. They're looking big picture, where do I start. Where's main? Does anyone have any suggestions on how to handle both sides of the picture? Make the code small, self describing, and maintainable (like legos), but still be able to step back and see the tie fighter? I feel that much of my code is very flexible and can be used in ways that were never intended, but what good is that if I'm the only one that can understand what the jigsaw puzzle looks like? I'm not sure detailed design docs or anything else are a big help (been there, tried that), plus they slow you down when you have to add gobs of documentation each time you refactor something into smaller more maintainable piece.