Size Is The Enemy

December 23, 2007

Steve Yegge's latest, Code's Worst Enemy, is like all of his posts: rich, rewarding, and ridiculously freaking long. Steve doesn't write often, but when he does, it's a doozy. As I mentioned a year ago, I've started a cottage industry mining Steve's insanely great but I-hope-you-have-an-hour-to-kill writing and condensing it into its shorter form points. So let's begin:

  1. Steve began writing a multiplayer game in Java, Wyvern, around 1998. If you're curious what it looks like, see fan screenshots one and two.
  2. Over the last 9 years, Wyvern has grown to 500,000 lines of Java code.
  3. Steve realized that it is impossible for a single programmer to singlehandedly maintain and support half a million lines of code. Even if you're Steve Yegge.

There's much more, but I want to pause here for a moment. It is absolutely true that any programmer who personally maintains half a million lines of code is automatically in a pretty rarified club. Steve's right about this. Most developers will never have the superhuman privilege of personally maintaining 500k LOC or more. On any rational software development project, you'd have a team of developers working on it, or you'd open source the thing entirely to spread the effort across a community.

But here's what I don't understand:

I happen to hold a hard-won minority opinion about code bases. In particular I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size.

So Steve believes the majority of developers, when encountering a code base approximately the size of the Death Star, would think:

I could totally build that.

It's a telling indicator of the impressively bearded computer scientist crowd that Steve runs with. They probably wear flip-flops to work, too. Amongst the programmers I know, the far more common-- and certainly more rational-- reaction to a code base that large would be to run away, screaming, as fast as they could. And I'd be right behind them.

I don't think you necessarily have to spend ten years writing 500k worth of fairly complicated Java code to independently reach the same conclusion. Size is the enemy. Simply going from 1k to 10k LOC-- assuming you're sufficiently self-aware as a programmer-- is more than enough of a glimpse into the maw of madness that lies beyond. Even if you've written zero lines of code, if you've ever read any Steve McConnell books, the size rule is pounded home, time and time again:

Project size is easily the most significant determinant of effort, cost and schedule [for a software project]. People naturally assume that a system that is 10 times as large as another system will require something like 10 times as much effort to build. But the effort for a 1,000,000 LOC system is more than 10 times as large as the effort for a 100,000 LOC system.

One of the most fundamental and truly effective pieces of advice you can give a software development team-- any software development team-- is to write less code, by any means necessary. Break the project into smaller subprojects. Deliver it in complementary fragments. Try iterative development. Stop writing everything in assembly language and APL. Hire better programmers who naturally write less code. Buy code from a third party. Do absolutely whatever it takes to write as little code as possible, because the best code is no code at all.

We're not done yet. I warned you that this was a long post. Continuing from above:

  1. Because Java is a statically typed language, it requires lots of tedious, repetitive boilerplate code to get things done.
  2. That tedious, repetitive boilerplate code has been codified into Java faith as the seminal books "Design Patterns" and "Refactoring".
  3. Java developers fervently believe, almost to a man/woman, that IDEs can overcome the unavoidable LOC bloat of Java.
  4. A rewrite of Wyvern from Java into a dynamic language that runs on the JVM could reduce the raw code size by 50% to 75%.

Here's where Steve not-so-gently segues from "size is the problem" to "Java is the problem".

Bigger is just something you have to live with in Java. Growth is a fact of life. Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.

Tetris: Game Over

Going back to our crazed Tetris game, imagine that you have a tool that lets you manage huge Tetris screens that are hundreds of stories high. In this scenario, stacking the pieces isn't a problem, so there's no need to be able to eliminate pieces. This is the cultural problem: [Java programmers] don't realize they're not actually playing the right game anymore.

Steve singles out Martin Fowler, who recently "abandoned" the static-language Java fold in favor of the dynamically typed Ruby. Fowler quite literally wrote the book on refactoring, so perhaps there's some truth to Steve's claim that the rigid architecture of classic, statically typed languages ultimately prevent you from refactoring the code down as far as you need to go. If Fowler can't refactor the Java pieces to fit, who can?

Bruce Eckel is another notable Java personality who apparently reached many of the same conclusions about Java years ago.

I can't quantify [the cost of strong typing]. I haven't been able to come up with a from-first- principles mathematical proof, probably because it depends on human factors, like how much time it takes to remember how to open a file and put the try block in the right places and remember how to read lines and then remember what you were really trying to accomplish by reading that file. In Python, I can process each line in a file by saying:

for line in file("FileName.txt"):
  # Process line

I didn't have to look that up, or to even think about it, because it's so natural. I always have to look up the way to open files and read lines in Java. I suppose you could argue that Java wasn't intended to do text processing and I'd agree with you, but unfortunately it seems like Java is mostly used on servers where a very common task is to process text.

Lines of code are, and always have been, the enemy. More lines of code means more to read, more to understand, more to troubleshoot, more to debug. But it is possible to go too far in the other direction as well. If you're not careful, you could end up playing yet another game entirely-- yes, you've cleverly avoided the trap of Java's infinitely tall Tetris, but have you slipped into Perl's Golf instead?

Perl "golf" is the pastime of reducing the number of characters used in a Perl program to the bare minimum, much as how golf players seek to take as few shots as possible in a round.

NES mario golf

It originally focused on the JAPHs used in signatures in Usenet postings and elsewhere, but the use of Perl to write a program which performed RSA encryption prompted a widespread and practical interest in this pastime. In subsequent years, code golf has been taken up as a pastime in other languages besides Perl.

In our war on verbosity, there's an inevitable tradeoff between verbosity and understandability. Steve acknowledges this by hinging his JVM language choice on what is "syntactically mainstream": JRuby, Groovy, Rhino (JavaScript), and Jython. I'll spoil the not-so-surprise ending for you: Steve is rewriting Wyvern in Rhino, and in the process he'll help bring Rhino up to spec with the forthcoming EcmaScript Edition 4 update to JavaScript. It's no magic bullet, but it seems like a reasonable compromise based on his goals.

So ends the epic ten year tale of Stevey and his merry band of Wyverneers. But where does that leave us? I have my opinions, naturally:

  • If you personally write 500,000 lines of code in any language, you are so totally screwed.
  • If you personally rewrite 500,000 lines of static language code into 190,000 lines of dynamic language code, you are still pretty screwed. And you'll be out a year of your life, too.
  • If you're starting a new project, consider using a dynamic language like Ruby, JavaScript, or Python. You may find you can write less code that means more. A lot of incredibly smart people like Steve present a compelling case that the grass really is greener on the dynamic side. At the very least, you'll learn how the other half lives, and maybe remove some blinders you didn't even know you were wearing.
  • If you're stuck using exclusively static languages, ask yourself this: why do we have to write so much damn code to get anything done-- and how can this be changed? Simple things should be simple, complex things should be possible. It's healthy to question authority, particularly language authorities.

Remember: size really is the enemy. Right after ourselves, of course.

Posted by Jeff Atwood
120 Comments

Casey:

I really wonder why ignorant people always seem to enjoy picking on APL. Before you write about another language, you should really learn enough about it (whatever its character set may be!) to realize that a properly written APL program is in no way a "mish-mash of line nois#8764;R#8712;R#8728;.R)/R#8592; 1#8595;#9075;R1#8595;#9075;R #8728;.#8728;.{{();}{() {..("");} }{(){..("");}}([]){[]=[]{(),()} #8764;#8764; #8764;#8764;#8764; #8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;#8764;

Ralph on December 24, 2007 1:19 AM

Not that this applies to Steve's case, as I respect him, but 500 K of LoC = "my Kingsfield notes on contract law". All of us have worked with "my-library/framework-is-going-to-be-better" coder guy. What a pain in the ass he is. But, if you write the monstrosity and get away with it, sadly, you have just earned yourself some job security.

I have cleaned house on some of those types of projects and the thing that sucks is the genius 500 coder dude, with his love of complexity and intricacies, has created a cash cow for the unscrupulous sales force to sell consulting services. Most of the business unit folks don't know what a piece of crap they have, and will shell out more for a 750K library. You can't fire the genius easily because the code base is so large, the business is tied to it. It sucks as it shows that crappy software with high maintenance.

Staticly typed or dynamic doesn't matter, level of complexity does.

ActiveEngine Sensei on December 24, 2007 1:25 AM

Anonymous, I was blown away by your first Haskell implementation of quicksort.(1) Impressive!

At that rate of productivity, a programmer's alloted ten lines of code per day could get a spacecraft to the moon in a month. Or at least somewhere into outer space. Maybe even another dimension.

Keep up the good work!

------
(1) Sadly, I was disappointed with your second implementation. Far too complex, ridiculously bloated. Please tighten it up.

Ralph on December 24, 2007 1:36 AM

Wait, wait, a project written in APL would wind up being really short (it might be an unmaintainable mish-mash of line noise, but at least it would be short).

Casey on December 24, 2007 1:38 AM

Not sure why everyone is on the 'its all that static typings fault' warpath at the moment. There is never a silver-bullet i.e. swapping to a dynamic language (although I do a lot of python and still think it is one of the best languages around). The problem is with the expressiveness of languages - things like meta-programming in ruby - which just happens to be where the dynamic languages are at. And the crufty C-style syntaxes which are present in the mainstream statticly-types languages, C#, Java.

It is possible and static languages are getting better (Scala/Haskell) at providing this power with a nice syntax. I can see a powerful, expressive, statically typed language hitting the mainstream soon - all the power + the checking of the low hanging fruit with typing (it doesn't pick up a lot of bugs, but it does save on a few tests and prevents 'silly' mistakes, which I am not going to say no to).

Again I have to repeat that I am all for the current dynamic languages moving forward, particularly python and ruby, but it is a mistake to identify their positives as benefits of dynamic typing.

Mark Hibberd on December 24, 2007 1:40 AM

Yep, don't forget the functional languages, especially those with a marvelously strong type system. What can beat the following implementation of quicksort in Haskell

qsort [] = []
qsort (x:xs) = qsort (filter (x) xs) ++ [x] ++ qsort (filter (=x) xs)

in conciseness and understandability? Bonus question: what things can this function sort, i.e. what's its type? Numbers? Strings? Anything?

Anonymus on December 24, 2007 1:46 AM

It's ridicolous that you favor non-typed languages for projects of sizes greater than 100k-LOCs. When you work in greater teams, using typed languages is the only way to survive. Fail early, or you lose in the long way.
typed languages are like well-defined contracts and rules. Without visible rules, you could not master large projects.
Not shorter code is the target, but READABLE code. Understandable code, it's not the shortest-code-contest.
Ever seen the shortest c-code for a problem? That could not be the right direction, which you show us here.
The type-reflection of c# 3.0 is nice trade-off between ease of writing and the favors of typed languages.

C. Wissing on December 24, 2007 1:46 AM

Darn formatting! How am I supposed to write code in the comments?

qsort [] = []
qsort (x:xs) = qsort (filter (x) xs) plus;plus; [x] plus;plus; qsort (filter (ge;x) xs)

Anonymus on December 24, 2007 1:51 AM

I've written my fair share of C++, C#, Java, Javascript, and PHP.
C# is by far the winner, even if it is strongly typed.
At least I know what the hell is going on, unlike in Javascript, where you never know what you're going to be dealing with.
No documentation? Well, you might have a pretty hard time even finding out what fields exist on the object.

Allied on December 24, 2007 1:51 AM

i don't like the amalgamation between statically typed languages and Java.

yes, its true, statically typed languages are more verbose than dynamically typed ones. this verbosity has a cost. it takes more time to write code and needs some more efforts to maintain. but this cost is also for good: the code is cleaner, often easier to read or, at least, less subject to invisible side-effects (sometimes called programming tricks by script writters). it brings more security.

but not all statically typed languages are lacking the toolbox anyone can await from a real programming language: i mean enumerations, file i/O, and the like... if you are struggling to write a snippet that reads every lines of a text file, you are in deep trouble. either you are not good at programming or your language is badly designed.

for example, reading every lines of a text file in Ada is 6 lines long: 2 for opening/closing, 2 for the loop, 1 for the read, and of course 2 for the declarations. it is not really longer than the equivalent code in python.

i do really think java was badly designed. it is missing a lot of standard features, and bloated by the "all object" methodology used. please, remember that java does not represent all statically typed languages, it is only one example, and not a pretty good one.

also, remember that every language has its prefered domain of use. i will not write my one-liner scripts in Ada, but i won't write a 500,000 loc software in perl.

rien on December 24, 2007 1:54 AM

Another thing in case of Java is its extremely verbose object and function names. While that is great for a beginner to help understanding (though honestly, to really understand the code they'd have to read the actual documentation and find out all the details anyway) but it's just annoying if your average function call starts spreading over 3 lines or more.
Also it strongly depends on what you do. I like to do stuff that involves a lot of bit-fiddling, and to me pure C still seems most suitable for that.
The downside is that every time you also have to do something with strings it gets very painful very fast.
As another point I really like C# and Delphi for Gui development, I think they do a superb job there. Except for the fact that unless I restrict myself to .Net 1.1 I haven't found a way to make the applications work on both Windows and Linux (short of installing and using mono on Windows), so I don't really have any use for them after all :-(

Reimar on December 24, 2007 2:08 AM

I love writing code in Ruby.
I hate reading code in Ruby if it's not my own. Where does it start, where does it end. What's supposed to go in here? And what comes out there?
I like writing code in C++.
I don't mind reading code in C++. Even if badly written. Much easier to understand and update or refactor.

KristofU on December 24, 2007 2:10 AM

I agree with C. Wissing.
You're rarely wrong Jeff, but when you're wrong, you're really wrong.

I think what we have here is the confounding of two issues.

I think the real point here is that it's hard to write giant complicated programs.

On the border between simple programs and moderate sized programs are programs that become "small and simple" when written in dynamically typed languaged but are effectively "complicated" when written in a statically typed verbose language like C++ or Java. In this small category of programs you may be better off eschewing the strongly typed verbose language.

But you shouldn't mistake this phenomena for the idea that truely, innately complicated programs would therefore be easier to code and maintain in a dynamically typed non-verbose language. As C. Wissing says, exactly the opposite is the case. In truly large projects, all of the strictures of static typing and verbosity act like a contract which protected the code.

In many ways the analogy with contract is very telling. If you are arranging a quick agreement between friends for a $50 loan you are better off just with a handwritten IOU and avoiding all the legal technical jargon and formalities. But no idiot would take a fifty million dollar loan without a formal, legal, verbose contract.

mouser on December 24, 2007 2:23 AM

As much as I'd like to rag on statically typed language, declaring variables has saved me a number of times from when I made a typo in a variable name or used a similarly named variable instead of the correct one.

This is why, when I use Perl, I always have a use strict; near the top of the file, and I really wish other dynamic languages had it.

Yes, it does mean you have to declare variables at the beginning of the scope during which you want them to exist, but without doing that, how is the language going to know what scope said variables should be in?

Powerlord on December 24, 2007 2:26 AM

Jack -

You can read lines via a URL like this in python:

for line in urllib2.urlopen("http://www.google.co.uk"):
print line

Insofar as I understand Java's iteration syntax, the underlying mechanism is the same, just as it is compared to the C# equivalent. "file" is the name of the class; the file("xxxx.txt") expression creates a file object, and file objects conform to the iterable protocol; the fancy for syntax directs the compiler to produce the relevant while loop that you'd otherwise have had to write by hand.

I'm not aware of anything in the standard linrary that will allow one to do the same with images, but you could easily write one if you require such a thing, as there's not really any magic involved.

Tom_ on December 24, 2007 2:38 AM

People agitating for "meta" languages forget they are living in real world with no ideal programmers. I dived into c# linq yesterday, and it was not easy to fully feel and use new concepts.

What to say about functional langs then?
Just dynamic languages are probably okay, but catching errors is harder and even in these exist things average programmer will be really confused with. (i.e. closures are pretty easy ton understand in theory, but in practice there are many complications)

Java and c# made with that in mind. They are not so beautiful as python, but they make programmers write code anyone can understand.

Vladekk on December 24, 2007 2:51 AM

@Reimar

Mono supports .NET 2. It installs two compilers mcs and gmcs. gmcs being the mono compiler for .NET 2 :)

Nik Radford on December 24, 2007 2:52 AM

The reason we can now use dynamic languges safely is that Test Driven Development provides the same safety net as the compiler in a static languge, with the massive added benefit of telling the programmer if the application is working functionally.

Adrian M on December 24, 2007 3:26 AM

What an awful and popular mistake. I implore you to retract and apologise before this mythological nonsense is repeated. You need to know what you need to know before debating type systems.
http://cdsmith.twu.net/types.html

Case in point:
"yes, its true, statically typed languages are more verbose than dynamically tzyped ones."

Actually no, I can think of many statically-typed languages that are CONSIDERABLY LESS VERBOSE than all the fanboy (read: popular among the amateur language theorists) dynamic languages that have been discussed so far.

Tony Morris on December 24, 2007 3:33 AM

@Steve:

Don't dis SQL. It's not QUEL or data sublanguage ALPHA; but if we end up with X*crap instead, you'll be singing out of the other side of your mouth. Have you looked at what X*crap would have us do?

Not that data languages are the crux of this discussion, but they could be.

BuggyFunBunny on December 24, 2007 3:43 AM

@Nik Radford
Yes, I know mono can run .Net 2.0 stuff, but it has proven useless for Gui, because even the simplest forms are arranged incorrectly with half of the stuff being placed outside the window (IOW not even remotely useable).
A way to avoid that problem is to use GTK#, but that exists only for Mono and .Net 1.1.

Reimar on December 24, 2007 3:44 AM

funny b/c I was just thinking about writing an essay on why aerospace manufacturers always seem to equate completeness of their avionics software with x millions of line of code completed. As if when they reach the magic preordained number they're done! Or the metric of how any millions of lines of code have already been written is meaningful _in any way_. Moreover, how do they compute that number? Should kinda make one think the next time they get on an airplane, eh?

jake on December 24, 2007 3:49 AM

Vladekk:

a) Closures are simpler than Java anonymous classes, Java generics, or Java classloading.

b) Python code is more readable for non-programmers than Java.

c) C# LINQ isn't a simple or dynamic language, it's more extra legs nailed to the Java octopus. Simple dynamic languages suitable for learning: Smalltalk (more OO) and Scheme (more functional).

Vladimir Slepnev on December 24, 2007 4:00 AM

Go on, handle each line of a file as a Url, do it, show me the code, go on. Show me the damn code.

from urllib import urlopen

for line in urlopen("http://www.codinghorror.com"):
do_whatever(line)

Roberto Bonvallet on December 24, 2007 4:20 AM

Hate to be harsh, but to those people who think that extra typing is a problem, that is ridiculous. If the extra typing makes it hard to read that is a problem, but more characters does not mean it is bad. For any non-trivial case, thinking, not typing, is the limiting factor in software development.

Mark Hibberd on December 24, 2007 4:44 AM

@Mark Hibberd
Your are mostly right, but there are extreme (and stupid) variants of type-safety like VHDL where in the average code about 10% are simply typecasts, sometimes 5 of them in one line.
This is an example of a language that is horribly bad due to badly designed type-safety.

Reimar on December 24, 2007 4:50 AM

nice summary jeff!!

my 2c:

java's verbosity is not due to it being statically typed.

for example:

features that make C# less verbose than java:
generics, lambdas, closures, extension methods, anonymous types, linq. (others i can't think of)

features that make F# less verbose than C#:
type inference, pattern matching, (many others i don't understand yet)

F# is far 'stricter' in its typing than java, yet far more concise!

lesson: 'verbose' and 'strict' are orthogonal concepts.

lb

secretGeek on December 24, 2007 5:00 AM

Don't blame the language. Blame the humans.

As developers, we chronically overestimate our own intelligence and ability. We think we'll remember next year, next week, tomorrow. The fact is, we don't.

Good coding in any language means accommodating human limitations. Simple methods like:

1) Design it before you code
2) Keep it as simple as possible
3) Document it clearly

matter more than the language.

My other rule of thumb is, "write for maintainability, not cleverness," the opposite of programming language golf. You have no idea how much my agony levels have decreased since I got *that* principle.


ThatGuyInTheBack on December 24, 2007 5:14 AM

Jeff,

Do you not mean size is complexity ?

David Ginger on December 24, 2007 5:25 AM

Umm...well, I'll wish you a happy holidays then Jeff!

One of my favourite parts of this blog is the fact that it points me to other great blogs too :)

I'm not sure I agree with the less is more philosophy here. If I had to write a particularly complex algorithm, I'd definitely try to make it as simple as possible for maintainers to read. It links to what ThatGuyInTheBack was saying really.

Mind you, unless it's entirely necessary, I'd take a performance hit and make code simpler to read, so maybe it's me thats wrong.

Devineman on December 24, 2007 5:39 AM

Sure, size is the enemy, but I don't think that the savings that dynamic languages give you cut to the crux of the problem. The crux of the problem is featuritis.

Requirements are design. Having too many is a problem. Killer products don't try to be everything, they focus and nail the critical core that provides value.

Michael Feathers on December 24, 2007 5:41 AM

@BuggyFunBunny

Thanks for the comment, but SQL does suck big time compared to the LIST command in NOMAD2 (I would say that SQL has perhaps 6% of the LIST command's capabilities).

Unfortunately, if someone never worked on a mainframe at a large company back in the late 70s or 80s, they did not get the exposure, and therefore have no proper frame of reference with which to judge. Sadly, NOMAD2 never made the jump to client/server successfully.

Steve on December 24, 2007 5:43 AM

Imagine the morons whose codes get posted on the Daily WTF working with a dynamically typed language. Depending on the path the code took, you might be given different objects. In one case, the object gets a member 'parent', in the other it's called 'PARENT'. No thanks.

Also, it took 500K LOC for a crappy 2D RPG? Here's the stats for Quake3:

Files: 912
Lines: 498019
Lines Blank: 58540
Lines Code: 296398
Lines Comment: 110583
Lines Inactive: 30567

So... the 2D RPG should probably be a lot less.

Maro on December 24, 2007 6:15 AM

Should kinda make one think the next time they get on an airplane, eh?

Why? Do many planes crash due to software issues? Seems like, supposing they say what you say they say, it might be a good idea after all.

Dave on December 24, 2007 6:22 AM

"c) C# LINQ isn't a simple or dynamic language, it's more extra legs nailed to the Java octopus."
--Vladekk

LINQ has allowed me to take 50 lines of code and one-line-it. C# with LINQ is powerful. Download the Visual C# Express and review the sample code.

Brian on December 24, 2007 6:25 AM

I believe that there is a balance with everything. Having written some websites in classic ASP with VBScript where you CANT declare the type of your variable vs. ASP.NET with insert your favorite .net language here, I'd choose the latter, because I got into trouble SO many times in classic ASP. However, I like the tetris analogy that I feel Java, and now the .NET framework have sort of fallen into. When starting out with Java, I always felt like I was solving a puzzle to get what I wanted. Consider the following:
BufferedReader cin = new BufferedReader(
new InputStreamReader( System.in ) );
This is how to initialize an object to read information in from a console input. How is a new programmer supposed to make head or tails of this? "Well, simply create a BufferedReader..." Now, if I want to read input from a console, every application has to have this "just copy it to make it work" line of code, or create a utility class that contains all common functionality that should be there in the first place.
Now, alot of people blast on VB.NET, but the VB.NET team has realized the "less code is better" approach, and has worked toward that goal.
Consider:
dim s as String = My.Computer.FileSystem.ReadAllText("filename.txt")
I LOVE being able to read a text file in one line of code (even if I HATE the word My).
But File IO is a very common task, so why make us jump through hoops to do it?
Anyway, the good side of all of this debate is now I'm on a mission to learn a dynamic language (I downloaded ruby the other day) to see what the hullabaloo is about, and hopefully I'll learn something good.

Jeremy on December 24, 2007 6:29 AM

half a million lines of code is a lot? Since when?

True, some dynamic functionality would be nice, and would easily reduce 20 or 30 thousand lines of code in my project, but I would hardly call it essential if you have object inheritance and typecasting AND a good scripting engine to create the type customized classes you require.

I typically find that when I create a type customized structure like a list that each type instance also has unique helper functions for the particular task at hand. Dynamic would get you half way there, but the unique functionality would still need to be written and maintained.

C# has a few nice features coming along those lines in Linq that should eventually reduce it even more, but never remove it completely.

And no, half a million lines of code isn't that much, as long as you have some semblance of organization and firm grasp on the concept of code reuse. Making libraries can dramatically simplify your life.

Of course, changing the libraries can be a little more complicated and tedious to ensure it doesn't break something using it, but again, a little practice and you start to learn when and how..

Xepol on December 24, 2007 6:29 AM

So Steve believes the majority of developers, when encountering a code base approximately the size of the Death Star, would think: I could totally build that.

I think the point was that if you have a team that has built a system that size, they will be impressed with themselves. "Look at how much we got done!" and "Wow, we must be really good to build this towering edifice of code!"

Personally, in my more foolish days (which, unfortunately, weren't all that long ago) I led a team that built a darn large system, and that's exactly what I thought. Sure it does a ton of cool stuff to have gotten to that size, but the size itself is indicative of things I *failed to do better* to keep it small, not indicative of our prowess as a team.

So now our job is to balance adding new functionality with trying to keep the size of the codebase either constant or shrinking, and I'm actively exploring alternative languages (as well as better code structure in general) as tools to help us do that.

Andrew Norris on December 24, 2007 6:30 AM

] for line in file("FileName.txt"):
] # Process line

That's wonderfully simple, I agree. But there are ways in C++ to write the code that does the bulk of the stuff that the programmer doesn't want to have to deal with repeatedly. Create a class that can open a text file, loop through the lines until EOF, and call a virtual method with the line. Now subclass and provide your own custom method for processing the line. Now for simple text file processing, nobody ever has to think about how that file is being opened and they don't have to write that boilerplate code, or copy/paste it.

Personally, I find that when I'm in a rush (which is most of the time), I tend to write code inline, and cut and paste, and I don't stop to think, "This is something I'm using or will use in multiple places. How should I write it to be reusable, and which library should I put it in."

My current project uses three difference classes for reading and writing registry values, and there are parts of the code that actually don't even make use of the registry classes, they just call in the Windows API registry calls. Ugh! Time to refactor that into one class... Sure, I'll do that when I get free moment (in Jan 2012).

In a microISV, when your time has to go to development, qa, marketing, sales, it, accounting, and customer support, you tend to mostly be moving forward rather than looking back and cleaning up the messes, even when you know that eventually they will catch up with you and cause you problems. And you sometimes fall into the trap of not stopping and thinking about design as much. But it is important to do those things and will no doubt save time in the long term if you do go back to refactor the messes and take a little more time to "think" about what you're doing before you do it, rather than to just jump in and do do do.

that opens and loops through the file until it hits the end of file once in C++

Mark Sicignano on December 24, 2007 6:42 AM

Thanks for sharing this.

Go try rewrite it in ruby... I look forward to your post after you've tried... maybe then you'll make more sense

CptBongue on December 24, 2007 6:45 AM

Sorry about that last line of "editing scrap" in my previous comment. ;-)

Mark Sicignano on December 24, 2007 6:45 AM

I still remember my Introduction to Java course....

"I have to write WHAT to print 'Hello World' to the screen?!"

Hutch on December 24, 2007 7:10 AM

One area where I've definitely seen inexperienced programmers go wrong is knowledge of a language's standard and not-so-standard libraries. I mainly work in C for embedded Linux systems in my current job, and I've found plenty of places where using some extra function from GNU C library or glib has made things clearer and let me eliminate hundreds of lines of code that another developer wrote in our system with code that's been tested by a lot more people than our own group.

Ben Combee on December 24, 2007 7:33 AM

500.000 lines of java code - no boilerplate can excuse that for writing rather simple code. Its not as if rockets are using these 500k lines ...

And I would run away screaming from that quickly.

And, to the guy who seems to await a rewrite in ruby - one reason to USE ruby is to NEVER come to the need to have a 500k lines of code example ever, but once you want to write it, you must realize that some things are not yet possible in ruby since a few bindings are missing. I am sure Java has more to offer in this regard, and while you could write the bindings on your own, it would probably take quite a lot of time as well.

A better idea would be to use rubygame (SDL bindings).

she on December 24, 2007 7:41 AM

C. Wissing: Prove your assertion.

A few years ago, someone conducted a study where they set an identical task to a number of groups, each using a different language. Static, dynamic; interpreted, compiled; batch, interactive - all were represented.

The only significant difference in the results lay across the interactive / batch divide.

What that tells me is that what is crucial in language development is an instant feedback loop. The longer you have to wait for feedback on your code, the longer your development will take, and the buggier it will be. Once you have a zero-length feedback loop, everything else - static typing, implementation technology, environment style, whatever - is just window dressing; in the long run, it doesn't make any real difference.

Put another way - we are all aware of the need for instant feedback in user interfaces, whether CLI or GUI; whilst batch processing still has its place, interactivity - as much interactivity as possible - is what has made computers truly usable. Why, then, are most programming languages and environments - even Haskell, for heaven's sake! - still mired in the tarpit of 1950s-style batch-everything? Why, when we already know it doesn't work for end users, do so many proclaim it as a virtue for development?

gwenhwyfaer on December 24, 2007 7:43 AM

Just a nit: APL is the wrong example. APL was known for achieving a code density of 50-100x the Fortran it replaced, without golfing. (Golfing was common, though.) APL was actually a very high level language in an age when we hadn't really figured out much about languages.

Cobol is the canonical over-verbose language of the '60s, though PL/1, Fortran, or RPG would serve as well.

I am totally on board with the dynamic languages, though. I'm doing my first professional Python project after 25 years of C and C++, and the amount of code I can keep in my head is phenomenally higher in Python.

kbob on December 24, 2007 8:03 AM

I'm just a simple end-user (since 1982). I'm also a professional writer. And I can tell you that computer were not only more fun, but more engaging in the days of WordStar. Back then, we had tools. Now we have ponderous, patronizing mega-apps that are designed to take "difficult" decisions out of our hands. Well, people still use and love Vim. Go figure.

runbei on December 24, 2007 8:16 AM

Contrary to what some commenters believe, it is certainly possible to create and maintain large programs in Python - I am on a team working on a successful commercial project with 250,000+ lines of Python (excluding blanks and comments), and it is still manageable. The equivalent program in Java would probably be 1 million+ lines, and far less manageable.

IMHO the ideal program would have each line of code clearly expressing one concept - if it takes multiple lines to express a concept then the reader has to read through a lot of chaff to find out what the program is doing at that point. If there are multiple concepts crammed onto a single line then the reader has to work to untangle them - this is the difference between the compactness and succinctness. Python is one of the best languages I know for getting close to this ideal. Java on the other hand forces you to be verbosely loquacious and repetitively repeat yourself over and over again in a repetitive manner - compare the Java and Python versions of "hello world" (http://www.ferg.org/projects/python_java_side-by-side.html).

Part of the reason Java is so verbose is because it offers the "safety" of explicit static typing. Ironically virtually every substantial Java program in existence also contains some dynamic type checking. Every time you use a cast (which means that before 1.5 every time you used a container class), or every time you use reflection (e.g. every time you use XML to configure your Spring framework) it has to do the type checking at runtime. This is no different from how Python works. If dynamic typing was as fragile as some static typing proponents would claim then these Java programs would also be fragile.

Dave Kirby on December 24, 2007 8:19 AM

Wow. This is all very very wrong. You seem to have confused 'static typing' with 'explicitness'.

Huh? Static typing has very little to do with either boilerplate or big code bases. Look up 'inference', or have a look at what e.g. a language like 'boo' does. Java is very EXPLICIT, amongst other things, about its type system. Its type system isn't even particularly static; for example, all types in java carry an implicit Maybe (null), whereas other more expressive type systems, such as Haskells or Scala's, don't have that dynamism, and those languages are better for it. Static is very very important for big codebases. It's BETTER than dynamic, much better. Explicitness, especially pointless explicitness, is no good. Java has plenty of that.

The issue is not lines; it's concepts. If you have no better metrics available to you, the metric 'code lines' will have to do. There's certainly a relation. However, usually, you can do better. Example: The notion of where you put your brace (same line or next line) has zip squat to do with code complexity. And yet next-line-brace is capable of inflating your raw LoC by a factor of 10% or more. a keyword such as 'extends' is arguably easier to read than a symbol like ':'. Mostly, it makes no difference. And yet one is 7 characters, whereas the other is only 1. Being more explicit about the things you are dependent on (import statements) is generally a good thing. Global namespace languages like e.g. PHP need far less import statements, but for obvious reasons, having a global namespace is an absolute disaster for large projects. And yet one (global namespaces) allows elimination of massive amounts of LoC. Don't go there. Do not assume that every LoC eliminated is a win. That's not how it works. I'm not talking about golfing here; I'm talking about the vast space between golfing and oversize code. The area where LoC reduction just doesn't do very much.


An important distinction: there's libraries. If the LoC of libraries in included in the count for projects, most of us have worked with 1 million LoC+, which leaves Yegge's measly 0.5 megaLoC in the dust. Yet, no one seems to have much of a problem with this. Clearly then, it seems possible to abstract away entire bastions of code and take them out of the LoC equation. I won't go into detail on how this is done, and the many many pitfalls that await if you try to pull this off with your own code. I'm merely trying to state that you're dead wrong:

LoC is NOT THE PROBLEM. Code Complexity is.

My second problem with your article is the needless hyperbole.

I recently went toe to toe with Slava Pestov of Factor, writing a json marshaller/unmarshaller, he for Factor, and because I like a challenge, I decided to use java. I ended up with a LoC 15% higher than he did. Not good, but not even close to magic 50 to 75% reductions claimed - in this case, by yourself. We also finished in about the same time, incidentally.

I'm going to need to see some evidence for quotes that switching programming language can magically reduce Yegge's LoC by 50%. Remember that ANY disciplined rewrite, even from language X to the exact same language X, will eliminate a large amount of code simply because you now have some serious hindsight backing you up. X-to-X also makes it easier to do it in small steps and have lots of instant feedback about it. Refactoring is fun for a reason. It's also a lot easier to do in statically typed languages, incidentally.

The advice you're giving here is just... wrong. If your code base becomes unmaintainable, which is absolutely possible, as no amount of IDE goodness will scale to infinite amounts of complexity, you need to work on making it more maintainable. It's really that simple. One way to accomplish this is to revisit the -structure- and see if you can't eliminate lots of duplication. This might be harder in some languages versus others, but it's possible in most of them (Yes, including java). The blunt stick of 'total rewrite' should be used only if you have no other alternative; it's far, far less efficient compared to just improving your existing codebase. You should also use some tools to draw out a dependency tree (a thing which isn't easy at all in DMP-heavy languages like python or ruby or javascript, I might add, and trivial in something like java - static typing is WIN for large code bases, not loss, like you insinuate). Draw a big circle around something that looks like an island. Complete the abstraction and offload the entire thing into a separate project. The goal is to sever all dependencies that lead out of the island, and have only a few dependencies into it. Code analysis won't tell you the whole story (as I'm now guilty of oversimplifying some, there's also logical dependency to think of), but it's the right idea.

You can spin that whole island off to a separate group and if need be, let it be maintained by a separate team. You can even split this island into even more islands.

And, of course, if there are no islands to be found? I suggest praying. Or the scream, shout, and run, run, run away procedure.

*THATS* the stuff you need to be doing if your code is running away from you. Switching programming languages is a massive effort for little gain. Should you -start- with java if you are beginning from scratch? Probably not. Should you rewrite your X-megaLoC project into something else because Yegge and particularly you felt like pissing all over java? No - that's stupid advice. Add abstraction, carve out some libraries, but most of all, reduce dependencies. There's no set formula for any of it. Nobody said programming was easy.

Reinier Zwitserloot on December 24, 2007 8:43 AM

I wholeheartedly agree that it is insanely easy to get into code bloat with Java. It may work very well in college courses to show OO techniques and implement algorithms, but in real life there is just way too much work. In .NET they seem to be acknowledging this while still favoring static typing and thus we see the absence of checked exceptions , the inclusion of type inference and a bunch of declarative/functional stuff. Which is one reason why I much prefer working on Mono than Java.

Daniel on December 24, 2007 8:53 AM

I read the original long post (worth the time, IMHO) and didn't comment there, because I thought the author was taking the topic a little too personally. Maybe I am too.

I once worked in a large (hundreds of engineers) development organization where there were two types of developers: those that worked on the eight million line legacy code base that actually generated, directly or indirectly, about one billion dollars a year in revenue, or those that worked on smaller projects, the majority of which sucked money out of the company with the hope that they would eventually be profitable. During the decade I was employed there, I worked on both sides of the fence.

How do you get eight million lines of code? One line at a time, baby, one line at a time. No one sets out to write eight million lines of code. There are no simple answers of how to address this issue.

That code base was a victim of its own success. It didn't start out at eight, or even one, million lines of code. But it was a successful product. Customers wanted new features. Lots of new features.

You had to have hundreds of developers working on dozens of teams to develop different new features in the code base, because you can't be competitive otherwise without developing the features, and there's simply too much work for a small team to do.

Fall behind in feature development, and customers migrate to another product.

Give up that code base, and the company goes out of business, and thousands of employees all over the world lose their jobs.

Rewrite? So you go to a product manager and say "We want to rewrite the code base. It'll cost millions of dollars." She says "What difference will the customer see?" And you say "Well, if we do everything absolute perfectly, nothing." And the manager says "So let me see if I understand you: you want to spend millions of dollars so that in the absolute best possible circumstances (that are unlikely to occur) the customer will see nothing?" Those projects never get funded.

The answer is that you rewrite/refactor to reduce maintenance and future feature development costs, thereby reducing what the finance guys calls Cost Of Goods Sold or COGS. But the effort to reduce COGS cost millions of dollars over many years to have any observable financial effect, while funding new feature development pays off in months. Short term thinking says screw the COGS reduction, go for the low hanging fruit.

I've worked with developers who said "I choose not to work on a project of that magnitude". That's fine career wise, although you're saying you won't take jobs that make up probably the bulk of software developer employment.

But even so, if everyone did that, there are a lot of products you wouldn't have, because they can only be developed in a timely manner by a large organization (even if they are open source projects). I'm guessing we'd have to do without Linux and GNU, most of Apache, Windows in all its incarnations, MacOS, the entire telephone system, most of the Internet and the web, a huge part of the military/industrial complex, infrastructure stuff like Oracle and SAP that behind the scenes reduce the cost of manufacturing, etc. etc. etc.

"No silver bullet."

Chip Overclock on December 24, 2007 8:55 AM

I am surprised that nobody here has mentioned "library support".
For any serious work, I would any day pick a crappy language (whine, but still live with it) as long as it has good library support. For instance in Java I can find libraries to do 95% of the painful tasks that otherwise (however easy) I would've had to write myself.
bonus points if the libraries are open source.

Though it would be fun to write some of the libraries yourself, it is not always that you have time on your side. On a crazy project schedule, only reuse can save your a$$. This is the sad reality.

Does "x=(insert your fav lanugae here)" have a connection pool library for HTTP and DB connections (that retries, pings , cleans up dead connections?). Does "x" have a good GUI library thats cross-platform and has good RAD tools - very important for a GUI based project.

btw, I have been programming for the last 15 yrs. all the way from assembly to Java and now to Javascript!
Most of the time I've stuck to langugages that've helped me complete my task - fun or work - it does not matter.

Very few times I've had to write stuff up that did not require support from external libraries - infact in my professional career I dont remember any! shell script is powerful because of the tools that unix gives, remove them and hey life is not all the comfy anymore.

There sure is a difference between using a well tested library vs writing it from scratch and maintaining it. Hey you can't complain about code bloat if you didnt use the right libraries!

rubix on December 24, 2007 9:00 AM

That's why I always use Prolog - once the problem is defined I'm largely finished. It can't be much shorter (while remaining understandable) than that.

tndalpaul on December 24, 2007 9:13 AM

Mark, your advice to use inheritance based polymorphism is exactly why C++ has the rep that it has.

If you tried that with any codebase I was working on, I'd have to fight the urge to backhand you.

mreiland on December 24, 2007 9:31 AM

No one is mentioning that EC4 has strong static typing (if you'd like), and IIRC Steve is all about using it. The dynamic nature of the language he was talking about IMO was first class functions and the like. In fact, EC4 comes with some really cool type-based features that are worth checking out (e.g. the new explicit type system and the 'like' operator).

Consider:

type Point { x:int, y:int };
function takesPoint( aPoint like Point ):void
{
// inside aPoint is guaranteed to have members x and y
}

takesPoint will accept /any/ object that has a member x and a member y that are of type int and it will return nothing. The compiler can do this at compile time and it will also be checked at runtime (causing an exception if the contract is not fulfilled). Think about it -- interfaces without requiring explicit implementation of the interface.

Read the EC4 spec --- there are some gems in there. 'wrap' is another cool feature in the same vein as 'like'. I can't wait for these features as they will significantly reduce code and improve readability and keep safety.

James on December 24, 2007 9:49 AM

Firstly:

for line in file("FileName.txt"):

for (String line : new LineReader("FileName.txt"))
//process line

Your code is an inextricable monstrosity. I would hate to see the special considerations put in place here.

Can I:

for line in url("http://www.stupid-newbie-programmers.com")

Can I? If not why not? It makes no sense.

Yes, Java could do with a few more core libraries to do a number of things. The simple fact that these new development tools merely have some weak typed string manipulation features that exist in something as torrid as Javascript is no surprise.

Boilerplate code means you are constructing the behavior of something from code, not from syntax. You can reduce and reuse code, exactly like I have shown. What is the difference?

However, you could also do:

for (Url url : new UrlReader("Bookmarks.txt"))
//process url

for (Url url : new XmlUrlReader("Bookmarks.xml"))
//process url

for (Url url : new DeliciousReader("username", "password"))
//process url

for (Image image : new FlickrReader("stupid blog"))
//process image

And then you realize that this isn't some shortcut, hacky looking syntax crap that hides the very nature of the operations from your view.

Go on, handle each line of a file as a Url, do it, show me the code, go on. Show me the damn code. I dare you. now show me the code to handle the image in that way. You can't because you don't have a syntax shortcut to handle it.

Why is the CAPTCHA word always 'orange' for me?

Jack on December 24, 2007 10:10 AM

Java's problems are legion, but the fact that Java is statically typed is not one of them. The clunkiness which you attribute to static typing is actually because:


1. Java is not typed strongly enough, and not consistently. What's the type of the expression null, for example? Why do we have variables and also objects, which are themselves variables (i.e. they're mutable)? Why are Strings an exception to this?

2. It does not allow higher-order functions or continuations (even Pascal and C have this).

3. It doesn't allow type inference or higher-kinded types.


Haskell, Clean, and ML are statically typed languages with none of the clunkiness of Java. There's even such a language (Scala) that compiles to the JVM. Another (F#) compiles to .net.

Runar on December 24, 2007 10:11 AM

I think that Lines Of Code (LOC) doesn’t mean much. Of course it is a statistic that can be measured but it doesn’t tell you anything about the Quality of the code in question.
If the code is of a good OO design and actually reusable you might have something like:
A Business Rule Engine, a Report Engine, a SmartForm U/I Interface, Print Engine, and so on.
Every project is justification for creating reusable objects that become part of your toolbox for your next project.
OR
You create the world’s largest ball of string in code and run away from it when the project is finished.

Davide on December 24, 2007 10:41 AM

Well, I guess I'm in trouble. I just ran a line count and I got 498,234 lines of code. While I didn't write *every* line (actually about half are coming from my code generators), I am the sole maintainer of the code.

Worse, the code is a mishmash of ASP/VBScript (ugh) and ASPX/C# (for those parts I have been able to update).

Perhaps I should just take the cyanide pill now? Or maybe I can do what I have always done; treat the database as king, keep the business logic isolated and keep migrating as necessary.

As long as the number of *changes* remains within my reach, I don't really mind the huge scale of this project. In fact, the amount of functionality and diverse customer usages is a point of pride for me.

Wesley Shephard on December 24, 2007 10:51 AM

This problem is as old as the hills -- well almost.
I recall coming to a similar conclusion in the late sixties, long before java etc. In those days all work was done by batch processing using magnetic tape as storage. It's quite different now but the code problem was the same. Once written a line of code will haunt its author forever. Our average program size then was 5000 lines of low level code. As much of a maintenance nightmare as programs a hundred times bigger today.
John

john on December 24, 2007 11:06 AM

Yeah yeah, we get it: everything, EVERYTHING should be written in JavaScript because it's the best programming language ever. There shouldn't be any desktop applications other than web browsers because everything should be written in JavaScript and run as a web app.

Quit Vertigo and go work at an Initech and we'll see how cute some of this BS sounds.

N on December 24, 2007 11:10 AM

Sometimes it is helpful to understand WHY a language was written:

C: to have a portable language with near assembly speed
C++: does anyone know?
Java: to run small appliances
C#: to have a Java-like language that does .NET
VB: for ease and fun
Pascal: to teach how to program

etc.

Languages that go backwards in sophistication are written: SQL

We misuse languages. When I used 4GLs(Focus, NOMAD2) to replace what Cobol, PL/1, it was _freakish_ how much could be done with so few lines of code, but if you didn't know what you were doing you could abuse the CPU.

There are better languages for certain tasks. The right balance of control vs. brevity is hard.


Steve on December 24, 2007 11:21 AM

Lol - like others, the first thing I did after reading Steve's article was run a LoC counter on one of my rather large projects.

It's a rails project that I maintain part-time (like Steve and his program):

Controllers: 3549 LoC
Models: 2683 LoC (64 models)
Libraries: 2629 LoC
... and a bunch of plugins I didn't write and don't have to think about too much, not to mention the Rails framework itself which abstracts about 80% of the CRUD gruntwork. (to my friends who still handwrite all their SQL: I look at them like Betamaxes)

1) Lest we not forget Mr. Yegge has a full-time job at the big G and this is just his part-time baby. If you've ever had a project like this that you abandon for a few months and then come back to periodically, you know how hard it is to get back into and remember how certain aspects worked. I have this problem on 10k LoC -- I couldn't imagine the painfulness of this issue on a 500k LoC codebase.

2) Several commenters are probably right not to place the blame on static languages entirely. Java simply has a level of verbosity that is simply soulcrushing when noticed en masse.

3) A commenter noticed how it can be hard to read other people's Ruby code. I would tend to agree that this can be a problem, especially when taking over: A) some ruby rockstar's code, who likes to show off the nooks and crannies of the language, or B) some novice who doesn't know some of the easy-to-understand power tricks of the language.

Great recap, though.

Shanti Braford on December 24, 2007 11:38 AM

I code in both Python and Java and it's not dynamic typing that makes me more productive in Python (and other similar languages) but the syntactic sugar. Actually I wish I could declare the types of my variables in Python so I had proper refactoring support and the compiler would catch all the stupid mistakes I often make.

Observe the best of both worlds in the Boo language for example:

http://boo.codehaus.org/Builtin+Literals

(I wish it was not .NET specific)

nyenyec on December 24, 2007 11:57 AM

I know that any competent programmer can easily learn python or whatever we'd switch to, but unfortunately not having thousands of people in the area that know it is a problem for management.

[I like how using a dynamic language makes the responsibilities of your automated testing larger, instead of implying that since that the static type checking did well that everything will work fine...]

Nick on December 24, 2007 12:08 PM

Almost every argument I read for dynamic languages includes a one-line statement that would take 3 to 7 lines in a static language.
The one-liners only emphasize shortcomings in the library.

I could write a library that would make the same line of code work in any static language. It’s just that static language designers often fail to ship a well-designed API with their product. Compare GregorianCalendar to Joda (http://joda-time.sourceforge.net/).

I'm happy that Bruce can recite the one-line file parser from memory. But frankly, I can't. I need a little guidance when I type the dot operator, so I will stick with Java because of the intellisense support that Eclipse gives me. And if I take the time to find a decent file API for Java (or write one), then it can help me type that one line of code.

Michael L Perry on December 24, 2007 12:08 PM

I am also a Python programmer, and it is indeed much easier to write, read, and debug Python code. Having written several network applications in C, C++ and in Python (in that order), I realized that I could have saved a lot of time if there was someone who told me about Python earlier.

If you're a teacher, make sure you tell your students about this option.

I must emphasize one detail - in either case the #1 enemy is the human factor (i.e. us), as you've written in the last paragraph. My time management routines are far from perfect; I have several projects that have been idle for several months - and it's only my fault.

Alex Railean on December 24, 2007 12:33 PM

I loved the part about the Tetris pieces.

A lot of the junior programmers I have to shepherd are "satisfied" with what comes out of the box (we'll call it ASP.NET 2.0 for the purposes of this conversation). They have a hard time understanding why I frequently suggest writing controls from scratch, or favoring string outputs instead of nested control hierarchies. Page.Load() is their friend, even when it leads to complete spaghetti.

In the general spirit of the Pragmatic Programmers, I view stock controls and extended libraries as "Evil Wizards" whom should be trusted only as much as a senior Sith Lord. Learn what you can, then supplant them with something younger and more nimble. Microsoft themselves encourage simple custom logic for ASP.NET output, especially to improve scalability and performance.

As for lines of code, I could care less about code brevity, as long as two concepts are integrated into that code:

1. Write code that does ONLY what you need it to do (YAGNI)
2. Write the same logic once, and ONLY once (DRY)

Naturally following the above tenets will lead to a small code base, but shortening code lines alone misses the spirit of the issue.

I should note that my job involves creating "glue code" almost exclusively; standing on the shoulders of someone else's product, not maintaining a revenue-generating legacy.

Your results may vary.

Rick Cabral on December 24, 2007 12:44 PM

@tony morris

after re-reading the paper you re-pointed me at: you are right, my statement is misleading. (i must add that i do not agree with everything in this paper)

my programming experience is wide but i mostly used strongly-statically-explicitely typed languages, which i am a big fan of, and those considered extremely verbose: Pascal, Modula-2, Ada (no, not COBOL...).

rien on December 25, 2007 2:21 AM

I have 2 comments:

First every developper always claims that they are maintaining much more lines of code than they actually do. Recently I was consulting on a large project, they said 4M LOC, but it was actually 700K LOC (measured with NDepend). The difference comes from the way you're measuring LOC and I detailled here how you should do in the .NET world:
http://codebetter.com/blogs/patricksmacchia/archive/2007/10/03/how-do-you-count-your-number-of-lines-of-code-loc.aspx

Second, the phenomenon of large code base hard to maintain is better known as: Diseconomy of Scale. This is a phenomenon that explains why it can take a year to add some few LOC on a large project such as Vista (70M LOC). The maintenance curve is simply not linear from code base size, it tends to be square root or even logarithmic.

Patrick Smacchia on December 25, 2007 2:34 AM

(I hate how popular this blog is. I never get into the comments before they've degenerated into useless noise.)

All dynamic languages mean is that you can write a horrible mess in far fewer lines. I had a 16 KSLOC (24K actual file lines) project get unmanageably huge in PHP. Somewhere along the line, the cost of new features jumped from a week or so up to several months. We killed the project before we found out how long, exactly.

Even 1000 lines of Perl can be pretty big for a single person to manage.

It seems to me that programming is a fine balance. Trying to over-compact code is as bad as bloating it needlessly. It's not easy to handle a function with 10 parameters. Keyword arguments help, but only in Python where they're done right. Passing arrays/hashes as in PHP or Ruby is verbose and opaque, and Common Lisp's interaction between key and optional is subtle and evil.

In the end, writing less code means writing fewer bugs, but it also raises the complexity:LOC ratio. Less code in and of itself is no silver bullet.

sapphirecat on December 25, 2007 3:02 AM

LOC is not directly related to complexity, but it is an indication. Even blank lines, comments and lines with only braces can be counted. It takes that same time to scroll through a blank line as through a statement line.

In my experience only about 10% of a program code is the core. The rest of the program is the gunk that glues it all together and does input/output. Of the 10% core 90% is dealing with boundary conditions and edge cases leaving 1% that is the heart of the program.

Computer Science textbooks show the tiny 1% part that can be expressed cleanly. When you try to write a real program you find that 99% of it is gunk they didn't warn you about in the textbooks.

Doug on December 25, 2007 5:32 AM

LOC is not directly related to complexity, but it is an indication.

The metric Cyclomatic Complexity is a good indicator of complexity:
http://www.ndepend.com/Metrics.aspx#CC

It takes that same time to scroll through a blank line as through a statement line.

I disagree. We are not talking about scrolling but about understanding and maintaining code source.

In my experience only about 10% of a program code is the core. The rest of the program is the gunk that glues it all together and does input/output.

I disagree. Every single line of code can provoke a bug or a performance hit. Input/output is often what makes the difference between a successful piece of software, easy to tackle with, that satisfies users (and consequently that sells well), and an astute piece of software, that satisfies the developer ego, but that is unusable and that cannot be sale.


Patrick Smacchia on December 25, 2007 5:46 AM

It takes that same time to scroll through a blank line as through a statement line.

Man, need think

and on December 25, 2007 9:34 AM

Languages that use dynamic typing do result in fewer lines, but end up taking vastly more memory and time to execute. No dynamic language can be converted to machine instructions(as in C/C++), a cross-platform assembly(.NET), or even byte-code(Java). This isn't a shortcoming of the languages or compilers; its the nature of computers (more specifically, processors).

John on December 25, 2007 11:12 AM

If you personally rewrite 500,000 lines of static language code into
190,000 lines of dynamic language code, you are still pretty
screwed. And you'll be out a year of your life, too.

Probably more than a year. 500KSLOC is a *ton* of code. There's no way a single person is converting that amount of code to another language in that little time without tool help. Using a converter program might make it possible, but you'd probably end up with *more* code than you stared with using a converter. It would be machine-generated code too. Ick.

If he performs this rewrite in a year of hobby time with no major loss of functionality I'll be really impressed. Like starting a new religon level impressed.

T.E.D. on December 26, 2007 1:06 AM

The metric Cyclomatic Complexity is a good indicator of complexity:
http://www.ndepend.com/Metrics.aspx#CC

Well...perhaps for an individual routine. That's all it is really desgned for. For an entire 500KSLOC program you aren't going to get *a* number out of it that means anything.

There are some common constructs, for instance command processors (large case statements) that totally hose most cyclomatic complexity calculators too.

Cyclomatic Complexity can be useful to point out which functions or source files might need extra attention, but it isn't a very useful tool for looking at entire projects at a macro level.

T.E.D. on December 26, 2007 1:19 AM

+1 to rubix for library support.

+1 to sapphirecat who said: "All dynamic languages mean is that you can write a horrible mess in far fewer lines." Yes. I worked on some 4GL tools that could make a real steaming pile in very few LOC.

Ah, the real reason for the re-write. Wesley Shepard said, "Worse, the code is a mishmash of ASP/VBScript (ugh) and ASPX/C# (for those parts I have been able to update)."

I look at code I put down a year ago and always think it sucks. I can't imagine what Wesley sees when he looks at his 10 year old code.... (shudder)

OneMist8k on December 26, 2007 1:24 AM

I applaud you for admitting that you use a lot of Yegge's stuff - his posts are indeed rich food for all our minds. Raganwald shamelessly rips him off, and through his awkward rephrasing looses much of the meaning that comes from actually reading the whole Yegge post. Your condensing and commentary are a much better approach.

Duncan on December 26, 2007 1:51 AM

1. Write code that does ONLY what you need it to do (YAGNI)
2. Write the same logic once, and ONLY once (DRY)

Den on December 26, 2007 2:48 AM

John said: "Languages that use dynamic typing do result in fewer lines, but end up taking vastly more memory and time to execute. No dynamic language can be converted to machine instructions(as in C/C++)"

So, in other words, dynamically typed languages can't run on computers? You are aware that an interpreter does exactly that: Compiles a dynamically typed language into machine instructions, that you can then save (like Python does) or cache to memory (like PHP's APC does).

"a cross-platform assembly(.NET)"
Ruby on .NET AKA IronRuby: http://www.ironruby.net/

"or even byte-code(Java)."
Python on Java AKA Jython: http://www.jython.org/

"This isn't a shortcoming of the languages or compilers; its the nature of computers (more specifically, processors)."

In other words, you have no idea how dynamically typed languages actually work. Dynamically typed languages hide the complexity of memory management of variables from you, silently converting between types on the fly. Just because you can't see it happening doesn't mean that it *isn't* happening.

It's not limited to dynamically typed languages, either. In Java and C# (possibly C and friends as well), if you concatenate a string with a number, the number gets silently converted to a string before the concatenation occurs.

Powerlord on December 26, 2007 3:33 AM

Does it mean that *optimization* is a word to forget? Or to redefine?

Yesterday, low level programming meant a zillion hard to maintain LOC, but fast-as-lightning code execution and minimum memory use.

High level programming was hated because it produced low performance and high waste of memory.

Today there's so much unused cpu power and memory, that HL programming becomes the right solution: Careless use of memory, and waste of clock cycles are now invisible under thick layers of Gb and GHz, so you can pay the price and touch the Rapid App Development sky.

Is the concept of optimization redefined as a way to reach RAD, at any price, and should we forget the old one, oriented to avoid useless waste of resources? Isn't this point of view, the "I don't care if I waste!" one, carrying us (in other levels) to other problems, like for example, global warming?

oscar on December 26, 2007 4:05 AM

for line in url("http://www.stupid-newbie-programmers.com")

While this sounds like a wonderful succinct way of reading the content of a file or an URL, this completely misses the whole point about why the Java I/O library seems so complicated : because it has to.

The Java I/O library (though not perfect) makes a huge effort trying to fix the mess these kind of one-liner file reading statements are introducing. These problems are vastly ignored by most of the American developers: there is a huge difference between a binary file and a text file, and not all text files are encoded using ASCII.

So, in your example, how are you dealing with file encoding? What if the file happens to be coded in UTF-8 or ISO8859-1? Or in EBCDIC? What if the file happens to be a binary file? How do you deal with different end-of-line markers?

So in short: how do you handle real-file situations? By imposing us poor non-English speakers to forget once again about all the non-Latin characters? I was puzzled when I realized that Python does not offer Unicode support as part of the core language primitives (or was it Ruby? Or both?). Reading some text from a file *is* a difficult exercise. The fact that some language’s libraries manage to ease the reading of some ASCII text file is one thing, but this has little to do with the language itself in the first place, and this completely misses the point about the real-life world.

By the way, how do you decide you're finished with the connection?

Plouf on December 26, 2007 5:23 AM

@Joshua about UTF-8 on Unix

UTF-8 has been designed so that it wouldn't make legacy ASCII code go crazy. That's one thing that a function won't mess up with UTF-8, that's another to say that it will do sensible operation on it. Getting the size of an UTF8-encoded string has nothing to do with counting the number of bytes in that string. Blindly truncating an UTF8 string will give you an invalid string. Any other string operation such as getting the nth character will fail the same way.

By the way, the Windows world doesn't work with UTF-8, but UTF-16.

People discovering Java I/O library are always surprised when they realize that you have both Readers and InputStreams. One is for reading characters, while the other one is for reading bytes. These are completely different beasts. Trying to ignore that point is not a progress, but a sign that the underlying I/O model is fundamentally flawed.

Remember that you can open and read a file in C++ using one single line of code as well...

Plouf on December 26, 2007 5:42 AM

About decent I/O library:

How would you translate the following code into your favorite language:

BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream("file.txt"),"UTF-8"));
int total=0;
String line;
while((line=br.readLine())!=null)
total+=line.length();
br.close();
System.out.println(total);

Of course, file.txt contains nothing but Japanese characters (after all, Ruby was designed by some Japanese guy). And to add some salt, let's assume the file is so big it won't fit into memory.

Plouf on December 26, 2007 6:13 AM

Plouf: If your Common Lisp implementation has Unicode support,
it's a one-liner using SERIES:

(collect-sum (#Mlength (scan-file "input.txt" #'read-line)))

SERIES automatically rewrites the above expression at compile
time into optimized iterative code that doesn't try to store
the whole file in memory. It's specifically designed to allow
functional-style programming while avoiding the allocation of
temporary collections.

There's a Common Lisp (ABCL) built on the JVM, so presumably one
could leverage Java's libraries in order to read text files with
any Java-supported encoding.

Mark Hoemmen on December 26, 2007 6:50 AM

Re: unicode

Most of the UNIX world has standardized on UTF-8. Most ASCII functions just work* on UTF-8, so it could probably handle it just fine.

*Yes I know, not quite. Almost none of them will munge UTF-8 data though.

Joshua on December 26, 2007 8:15 AM

Dynamic languages provide both brevity of syntax as well as brevity of concept. Essentially, they work harder for you: handling memory, guessing types, determining appropriate action based on context, etc. In the end there is a trade-off: the potential ambiguity of implicitness for the tedium of explicitness.

I don't think, however, that strong, statically typed languages like Java or C# are necessarily more comprehensible simply because they're statically typed. Complex logic is hard in any language, and I think the best strategy for maintainability is (1) strong language comprehension, (2) clear coding, and (3) good documentation.

Here's an entry on my blog that I made a while ago. It cites a specific example related to this:

http://blog.arbingersys.com/2007/12/test1.html

I don't feel like I can simply write off statically typed languages however. There are too many successful, important projects developed with them. That in itself should be a testament to their "proven" value.

This isn't to say that they don't have warts. It's just that I have yet to see a language that will accommodate all of humanity, all of the time, in every circumstance.

JA on December 26, 2007 9:06 AM

Today there's so much unused cpu power and memory, that HL
programming becomes the right solution: Careless use of memory, and
waste of clock cycles are now invisible under thick layers of Gb and
GHz, so you can pay the price and touch the Rapid App Development
sky.

You must not have ever worked on a modern computer or console game, a scientific application, server that accepts thousands of connections or runs over a dozen virtual servers, embedded system, cell-phones, PDAs, ... or Vista ;-), etc...

The truth of the matter is that CPU power and memory are still very, very real concerns and it is attitudes like this that lead to software like Vista where people say "my computer ran faster with cheaper hardware and 1/4 of the memory on XP (or previous version of --- software)". Sure CPU power may not matter for things like word processing which are highly I/O bound, and for some very small apps memory may not matter, but there are many, many, many apps where CPU and memory still DO matter.

Isn't this point of view, the "I don't care if I waste!" one,
carrying us (in other levels) to other problems, like for example,
global warming?

Sure, but how about the example of making your customers upset (Vista, Paint Shop Pro, etc...). Just see this other Coding Horror blog: http://www.codinghorror.com/blog/archives/000973.html .

Kevin on December 26, 2007 9:26 AM

But before you re-write that 500000 LC in a more dynamic language make sure you know it well, don't write 300000 LC of python in the spirit of Java.
a href="http://dirtsimple.org/2004/12/python-is-not-java.html"http://dirtsimple.org/2004/12/python-is-not-java.html/a
a href="http://dirtsimple.org/2004/12/java-is-not-python-either.html"http://dirtsimple.org/2004/12/java-is-not-python-either.html/a

george on December 26, 2007 9:50 AM

While I think you make some valid points, Jeff, I disagree with your central thesis that Size = Complexity = Bad Application.

Please see my response at:

http://geekswithblogs.net/starr/archive/2007/12/26/117993.aspx

Jonathan Starr

Jonathan Starr on December 26, 2007 11:52 AM

I agree with the commenters who say that languages cannot solve the biggest problem, the people. I think everyone should be required to maintain a large project first so that when they are developing they have a good sense of where the rudundancy and bs code is going to bite them in the arse later. There's not much about it that can be taught or even codified. It's a more intuitive process.

Charles on December 27, 2007 1:04 AM

2 cents well spent.

Strongly typed languages were a reaction to the untyped languages that held hands too well like Basic. I remember learning the hard way why goto was a bad thing, and strong typing was a must. Top down flow was the order of the day when I learned coding, not oop. But the oop stuff is just an extension of the whole paradigm of putting similar things in similar boxes.

Getting rid of strong typing is going to produce some serious hacks. Manipultating strings as integers, and stuff like that is old hack code that has no place in modern software, even if it is fun to write. Nuf said, and if not, then go maintain your own code, and stay out of mine.

Bottom line is strong typing has a place, but you always need a mechanism to defeat it. But when you do so, you should always wrap that code in bomb proof I/O checking and make dang sure that nobody mistakes it's purpose by giving it an horribly long and exactly descriptive name.

Just like the convention stuff in Ruby, strong typing came about as a means to solve some problems that were rampant and bugging the heck out of people. For my money the best coding speed I have ever achieved in a full production system of over 100k lines is Delphi. And I have no qualms about recommending it. Ruby on Rails would come close as long as you dont need to do too much customization, but that is a language of another sort. Compiled code still keeps your algorithms a secret, and your IP safer from hackers.

Some Delphi coders from Borland have competed in the annual C++ speed coding event for many years. They have never been officially recognised for their efforts, but they ALWAYS beat ALL the C++ teams hands down. Often by several hours in the all day event. Delphi is remarkable for clarity and directness while maintaining a strongly typed paradigm. Forget even trying to write that fast in Java, you will never beat the slowest C++ teams with a great Java team.

I have often successfully claimed that I can lead a project faster in Delphi with experienced C++ coders new to Delphi, than those same coders can write the same program in C++ with all the tools and libraries they want. In fact one time I was challenged on this and the team tried to re-write a significant library in C++ and failed to do so faster the second time around. In fact they were slightly slower, even with the interface definiton well laid out.

When you add threads and sophisticated I/O handling to the package, then Delphi wins hands down. Delphi rocks. Every inch of that compiler is smart. Too bad so few real coders know about it. But then it is the grandfather of C#, being the brainchild of the same architect. Only Delphi never suffered from Microsoft bloatware syndrome.


Michael Rempel on December 27, 2007 1:17 AM

So, in your example, how are you dealing with file encoding? What if the file happens to be coded in UTF-8 or ISO8859-1? Or in EBCDIC? What if the file happens to be a binary file? How do you deal with different end-of-line markers?

Mmm... It's easy, realy :-)

file = codecs.open('someFile', encoding='utf-8')
for line in file:
# whatever

Now, do you know how to easily read file encoded in utf-32 in Java?

Marek on December 27, 2007 4:34 AM

"""
How would you translate the following code into your favorite language:

BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream("file.txt"),"UTF-8"));
int total=0;
String line;
while((line=br.readLine())!=null)
total+=line.length();
br.close();
System.out.println(total);
"""

Easy :-)
In Python:

text_file = codecs.open("file.txt", encoding="utf-8")
print sum(len(line) for line in file)

Marek on December 27, 2007 4:38 AM

Everyone keeps pointing out the python example of how to read a file.

What does that have to do with dynamic versus static typing in a language? It just means that Python has nice built-in file handling functions.

Hell, we could theoretically have a statically typed language that does exactly this:

FileLine line;
FileReader reader("file.txt");
for all line in reader do
# stuff

There's nothing preventing that. Python just has nicer built-ins than Java for reading strings... because that's kind of what it was designed to do.

The real problem is that the common statically linked languages are ancient, crufty, and not generally designed around quick coding convenience or string handling the way most scripting languages are. If that's what you're doing and you don't care about speed, sure, go ahead and use python. But there's more to programming than just string parsing.

Mike on December 27, 2007 5:37 AM

Hey Now Jeff,
I makes you think what type of code is his 500k http://www.codinghorror.com/blog/archives/001003.html
Coding Horror fan,
Catto

Catto on December 27, 2007 8:44 AM

I find bruce eckel's example proves the opposite of what he claims. I always have to look up the short syntax of the python statements such as that, whereas knowing how to open a file is just a trival File.OpenRead() away in C#. :)

Jess Sightler on December 27, 2007 9:59 AM

@Mike

Actually in Java you could even write a class TextFile implements IterableString, and you could write something like:

for(String line:new TextFile("file.txt"))
//use line

Plouf on December 27, 2007 12:10 PM

Completly offtopic probably, and maybe I am missing the complete point, but reading the lines from a text file in C# is quite easy.

string[] lines = File.ReadAllLines(@"d:\junk.txt");
foreach (string line in lines)
{
// Do your stuff here.
}

Also, just a bit more on topic. If you create projects with a descent layer structure, you easily hit the 500k LOC. But even with this kind of number of LoC, projects which are setup correctly are easily maintained.

"Size does not matter".

Also, I never (ever) want to work with untyped languages any more.

Wim Haanstra on December 27, 2007 12:36 PM

More comments»

The comments to this entry are closed.