Learn to Read the Source, Luke

April 16, 2012

In the calculus of communication, writing coherent paragraphs that your fellow human beings can comprehend and understand is far more difficult than tapping out a few lines of software code that the interpreter or compiler won't barf on.

That's why, when it comes to code, all the documentation probably sucks. And because writing for people is way harder than writing for machines, the documentation will continue to suck for the forseeable future. There's very little you can do about it.

Except for one thing.

Read-the-source-luke

You can learn to read the source, Luke.

The transformative power of "source always included" in JavaScript is a major reason why I coined – and continue to believe in – Atwood's Law. Even if "view source" isn't built in (but it totally should be), you should demand access to the underlying source code for your stack. No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find. This will be true forever, so the sooner you come to terms with this, the better off you'll be as a software developer.

I had a whole entry I was going to write about this, and then I discovered Brandon Bloom's brilliant post on the topic at Hacker News. Read closely, because he explains the virtue of reading source, and in what context you need to read the source, far better than I could:

I started working with Microsoft platforms professionally at age 15 or so. I worked for Microsoft as a software developer doing integration work on Visual Studio. More than ten years after I first wrote a line of Visual Basic, I wish I could never link against a closed library ever again.

Using software is different than building software. When you're using most software for its primary function, it's a well worn path. Others have encountered the problems and enough people have spoken up to prompt the core contributors to correct the issue. But when you're building software, you're doing something new. And there are so many ways to do it, you'll encounter unused bits, rusty corners, and unfinished experimental code paths. You'll encounter edge cases that have been known to be broken, but were worked around.

Sometimes, the documentation isn't complete. Sometimes, it's wrong. The source code never lies. For an experienced developer, reading the source can often be faster… especially if you're already familiar with the package's architecture. I'm in a medium-sized co-working space with several startups. A lot of the other CTOs and engineers come to our team for guidance and advice on occasion. When people report a problem with their stack, the first question I ask them is: "Well, did you read the source code?"

I encourage developers to git clone anything and everything they depend on. Initially, they are all afraid. "That project is too big, I'll never find it!" or "I'm not smart enough to understand it" or "That code is so ugly! I can't stand to look at it". But you don't have to search the whole thing, you just need to follow the trail. And if you can't understand the platform below you, how can you understand your own software? And most of the time, what inexperienced developers consider beautiful is superficial, and what they consider ugly, is battle-hardened production-ready code from master hackers. Now, a year or two later, I've had a couple of developers come up to me and thank me for forcing them to sink or swim in other people's code bases. They are better at their craft and they wonder how they ever got anything done without the source code in the past.

When you run a business, if your software has a bug, your customers don't care if it is your fault or Linus' or some random Rails developer's. They care that your software is bugged. Everyone's software becomes my software because all of their bugs are my bugs. When something goes wrong, you need to seek out what is broken, and you need to fix it. You fix it at the right spot in the stack to minimize risks, maintenance costs, and turnaround time. Sometimes, a quick workaround is best. Other times, you'll need to recompile your compiler. Often, you can ask someone else to fix it upstream, but just as often, you'll need to fix it yourself.

  • Closed-software shops have two choices: beg for generosity, or work around it.
  • Open source shops with weaker developers tend to act the same as closed-software shops.
  • Older shops tend to slowly build the muscles required to maintain their own forks and patches and whatnot.

True hackers have come to terms with a simple fact: If it runs on my machine, it's my software. I'm responsible for it. I must understand it. Building from source is the rule and not an exception. I must control my environment and I must control my dependencies.

Nobody reads other people's code for fun. Hell, I don't even like reading my own code. The idea that you'd settle down in a deep leather chair with your smoking jacket and a snifter of brandy for a fine evening of reading through someone else's code is absurd.

But we need access to the source code. We must read other people's code because we have to understand it to get things done. So don't be afraid to read the source, Luke – and follow it wherever it takes you, no matter how scary looking that code is.

[advertisement] Stack Overflow Careers matches the best developers (you!) with the best employers. You can search our job listings or create a profile and even let employers find you.
Posted by Jeff Atwood
54 Comments

Definitive and most up to date? Without question.

Best? That's arguable, and depends on the code base. Learning a language from the source (say Python or SQL) is a really bad idea. Best practices and KISS for the code is not something you'll learn from that.

Sqlrob on April 16, 2012 1:32 PM

@sqlrob: The answer to your point is in the middle of the article, and nowhere near as prominent as it should be. "you should demand access to the underlying source code for your stack" Emphasis mine.

Certainly for learning languages there is no substitute for expert guidance, whether it's through books, classes, etc.

But for any libraries that your code depends on, you will at some point wish you had the source code, even if it's a mess. The more critical a particular library or service becomes, the more important being able to see the details of its operation becomes.

William Furr on April 16, 2012 1:48 PM

I would very much like a "Read the source, Luke" t-shirt.

Hayden Muhl on April 16, 2012 2:09 PM

Granted I've only been in this industry for 9 years, but "maintain their own forks and patches" seems like advice that should only be used as a last resort.

When you do this you really hurt your upgrade path, and it is easy to get stuck with certain versions of software. I still cringe when I think of the shops have to use windows xp and IE 6 because of these kinds of dependencies (of course this is probably more due to activeX).

Sam Thomas on April 16, 2012 2:14 PM

I read code for fun. Well, for enjoyment. And, yes, real code too, not just the clever algorithms and snippets that abound on the web.

Daniel Sobral on April 16, 2012 2:17 PM

"fine evening reading through some else's code is absurd". Really? I've done that a lot of times. There is code out there that is worth reading. For me it is a way of learning, and I really enjoy it.

Diegosevilla on April 16, 2012 2:19 PM

This is one that drives me nuts on iOS. I'm forced to class dump and decompile to understand the issue or the undocumented behavior.

In Android, it's easier because the source. Unfortunately it lags really far back from the binary release before it becomes open. It's frustrating.

On the .NET stack, Microsoft listened and they include the source stubs in a download that Visual Studio will download. If you are stepping through into a system library, it shows you the actual source code (to level but it's better than nothing).

A Facebook User on April 16, 2012 2:32 PM

Absolutely true. I've been using SlickGrid for data display clientside for a while now. It's an amazingly capable library, but it has nearly no documentation. Fortunately for me, the source code is very well written; I learned more digging through the source than I ever would have found through Google.

Nicholas Flynt on April 16, 2012 2:32 PM

Reading source code is hard. This is mostly because the order that we read it is often not the order that we write it. The order really matters. In addition, the context and the reasons 'why' the code looks like it does is usually not present in the code.

Version control systems do not record this order or context. They take snapshots. A lot of interesting things happen in between the snapshots. This means a good deal of information is left on the cutting room floor.

Further, there is a narrative in our heads while we write code. This narrative has all the context and reasons why we are doing things in it. We almost never write this narrative down anywhere. We remember it for a little while but eventually it escapes into the ether.

I am working on a tool that records almost all of the data associated with a programming session and allows a developer to tell a 'story' about what they did and why. It has version control functionality (branching and merging) but offers a more useful documentation tool than simple commit messages.

People can use these stories to:


  • learn about how the systems they are working came to be
  • learn to be a better developer (you might be sitting three feet away from a great programmer but never learn anything from them, or you might be the great programmer and they never learn from you)
  • understand what effect every team member has on the code

For more info http://www.storytellersoftware.com

A Facebook User on April 16, 2012 2:32 PM

Fun Fact: "Read the source, luke" was the standard answer on FreeBSD mailings lists some 10~15 years ago, whenever someone asked a question about how something worked. That is, whenever that question was asked on the developers mailing lists.

Daniel Sobral on April 16, 2012 2:33 PM

You could've gotten your point across in 5-10 lines, everything after the first 2-3 paragraphs is just you repeating yourself. Also, why don't you follow your own advice and release the source code for Stackoverflow?

And its funny that you invented your own rule and call it the Atwood rule. Its also a bit sad.

A Facebook User on April 16, 2012 3:26 PM

you also want want to read source code to learn from it even if not part of your stack

it can help you to learn a new language, understand differences between languages, etc.

but most importantly, reading high quality source code
can teach you a lot about programming and in a very short time

Zwetan Kjukov on April 16, 2012 3:39 PM

Re open sourcing Stack Overflow: Jeff actually pushed for this in the early days of Stack Overflow, but he got push back on it and they never did it. And to be fair, they have open sourced several pieces of it that make sense to utilize in other projects. And then of course they license the content under creative commons, which I find quite admirable.

Also, I don't think open sourcing Stack Overflow really helps much as far as the point he is making... Stack Overflow shouldn't really be an element in your stack, unless you are developing a stack app. Thus almost nobody really needs the source to overcome programming obstacles for their application.

Mike Virata-Stone on April 16, 2012 3:58 PM

"Nobody reads other people's code for fun." - Re:@Daniel Sobral

I used to browse SourceForge and Google code to learn new concepts of programming languages I wasn't entirely familiar with...

Mike Roberts on April 16, 2012 4:53 PM

Nobody reads other people's code for fun? Hmmm...
I have fond memories of lounging on a sofa, reading printouts of Quake's source. Those were very informative and entertaining sessions.

Aleko Petkov on April 16, 2012 5:00 PM

I prefer Drambuie or Grand Marnier, but sitting back with a hardcopy of my latest source and a pen (red, preferred, I have a Lamy fountain pen with dark red ink in a converter) is one of my highest productivity times. The code typically works, but all the inefficiencies, errors, junk, remain awaiting markup.

Documenting code is an art, but futile in most cases as the source will be a moving target. At best, it is an ought - but regression or unit test source are better enforcerers.

tz on April 16, 2012 5:41 PM

The whole "read the source code, luke" is fine. News at eleven. Same with the whole "comments probably suck".

But the emphasis should always be on correct and informative comments and on correct and informative documentation. Always demand the source code for your stack, but demand even more comments and documentation.

Mario Figueiredo on April 16, 2012 6:57 PM

holly craps! i'm an old dinosaur been coding for 33 years!

let me put my two cents on helping source code to be SELF documented...

you need to learn just TWO keys:
1) TAB (aka, tab key, chr(9))
2) LF (aka enter key, CR, LF, CR+LF, chr(10), chr(13)... and so)

and don't be shy... be overgenerous! :-)

i mean...
any rookie can use SHIFT-F2 all the time, but there is a very hidden pleasure on pressing PG-DN until you RECOGNIZE the part of code you're looking for

could be just for hackers, indeed, but doing it in double 20+ inches' this day monitors... feels reaaaaaaly cooool :-)

if almost everybody learn to use tab for "horizontal" indent... why not encourage them to do the same with "vertical" indentation, separating pieces of code in meaningful and recognizable (i swear) chunks?

this relies in a past century theory about using both your brain hemispheres AT ONCE, the logical one and the "visual/creative" one

be an intellectual-artist!

br

A Facebook User on April 16, 2012 7:16 PM

Seems to me there are three levels.

1) read docs.
2) read source.
3) write it yourself.


Obviously each involves more time that could be spent doing other things; the ideal if you are using a common tool is to have good docs. But it's facile to suggest that one should just read the source when encountering a problem implementing a program, plugin, module, what have you. Am I going to read ffmpeg source when it borks in ./configure? Probably not. Will I use an un-minified jquery plugin to trace the source of an error? Almost always. Sometimes reading source is a really good idea. Other times it means you are looking in the wrong place for your solution. Ask the bear. If the bear won't answer, then read the source.

Axlotl on April 16, 2012 8:12 PM

I've found that I can't always find the answer from diving into the source code, but it often leads me to finding a more targeted question to work with, and it always helps me understand the underlying processes at work better. I feel as though if more questions were framed with references to the source code, they may get more answers as they've shown a good-faith attempt and provide a good context to help others dig into the problem.

Steve on April 16, 2012 9:26 PM

Great article Jeff.... :)

Currently I am working on same code base as it have very less documentation, but after reading this article I am inspired so much.

Keep posting such a nice and creative article... good luck. :)

Chirag Visavadia on April 16, 2012 9:29 PM

I think that reading the source is perhaps the quickest way to see what is going on so you can get something done. But when it comes to designing software, it is a good idea to have a spec to design to, rather than just the code, because the spec may provide the intention of the code, whereas the code itself is providing the implementation, which is more likely to change (one would hope, anyway).

If there is a difference, it's best to try to submit a bug report or tech support request of some sort, if this is possible, to clarify things.

But, in the end, the code needs to work, so it's best to read both and try to figure out which side is right.

Marty on April 16, 2012 10:02 PM

If you always need to understand the platform below (to understand your application), doesn't it come to understanding machine code and even electrical signals? I actually do support the idea that software developers (all sorts of programmers, actually) would be better off by basically reading the source instead of opening tens of support tickets when documentation isn't helping -- I just found that statement a bit too extreme.

Egeozcan on April 17, 2012 12:56 AM

The source may well be the most up to date and the 'truth', but it is by no means correct.

Although not a big fan of documenting the obvious I think that specifications should be mandatory, this way you know what the expectations were of the application, and the source code can not, and should not be the definitive source of such information. However I do agree that, particularly when bug fixing, you should be drilling down straight into the source and not much else if you want a speedy resolution, and all developers should be capable of doing this.

I think documentation pays off when you are providing reusable frameworks and other software, and even then it should be basic guidelines etc.

K Lawrence on April 17, 2012 2:25 AM

Damnit Jeff, I just spent the last 20 minutes looking for an easter egg in your html source code.

Micsco on April 17, 2012 2:40 AM

I agree with K Lawrence. The source code says what the application/library/widget does, not what is supposed to do. Some kind of documentation is needed.

Whenever I use a method or library created by someone else, inside my company or outside I don't want to look into the source code unless unavoidable. I just want to know how to call it, or what it does, or what is the flow.

For self-improving reasons I will want to read the code. But for the completion of the work, shouldn't be necessary.

Jorge Gueorguiev Garcia on April 17, 2012 2:59 AM

While there are some correct observations regarding the availability of source code, i'd like to focus at the start of the article, where i strongly disagree.
In short, it says "Documentation always suck in many ways and it will always do. Don't even lose your time with it, just read the code". And since it is useless this also implies that you should not bother to write it. That's absolutely WRONG in my opinion.

A complete documentation is important, it's not just a reference to what a function does. It also tells you what classes are available and when you should use them. If it is properly written it's much better than trying to read the source cose which may use a syntax you're not used to and various implementation details that you can't know. Try to read subversion source code.
Just yesterday i was reading a discussion on git where the developer used Javascript implicit statement separator (which means you don't put the semicolon) and expression short-circuit instead of a normal if statement. The first of these even breaks some javascript minimizers. Am i supposed to read the code, understand these patterns, figure out what they're doing and what that var is used to instead of just reading a good documentation? That's crazy.

Good documentation is a must, don't oversee it.

Zmaster . on April 17, 2012 3:08 AM

So it is possible to read someone elses code afterall!

Frantisek on April 17, 2012 4:22 AM

Sigh. So, another blow for cowboy/hero coding is struck. "Documentation is haaard!" So... just give up?

No. Documentation is essential. Sorry. If your code is clear, concise, self-documenting, that's great, I totally support that (and, even if it's not, I still want your source code), but I want to point out that that sort of lovely self-documenting code comes about because the coder thought of the person to come after, so that coder actually wrote DOCUMENTATION (it just happens to be executable).

Coding in a furball, piling up hacks, refusing to document and then handing the whole steaming mess to somebody who asks questions is chickenshit.

I have so had it with people using agile and "use the source, Luke" as an excuse for bad behavior. Yes, I know that if developers had the kind of social skills required for writing good documentation (as in: understanding the audience enough to know what questions they'll have, given that they're not an exact clone of you, the developer) they'd be something besides coders, but the answer is not "give up and tell 'em to read the code."

Johnl4 on April 17, 2012 6:00 AM

This is a contentious view because everyone has their own experience of code and documentation.

A few thousand lines of source code doesn't need much documentation.

Tens of millions of lines probably needs some.

A trivial system probably doesn't need much.

A complex system probably does.

A system with lots of external interfaces and dependencies probably needs some.

Where I think all experienced developers agree is that code must be written with support and maintenance in mind.

http://cvmountain.com/2011/09/whats-wrong-with-this-code-really/

Whether or not the system also needs documentation depends on the system and the context in which it will be maintained over its life.


KiwiCoder on April 17, 2012 7:50 AM

This is one of the reasons I don't like abstraction layers. You can't predict exactly what they're going to do, and you [mostly] can't read the source code to find out.

Ole Eichhorn on April 17, 2012 10:41 AM

I think I appreciate the point you're trying to make and I agree that people would do well to learn to read the source.

However, I disagree with your statement "No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find."

(Good) documentation is a description of what the software designer/programmer wanted to write, the code is what they ended up writing. If all goes well, these two are the same. But if mistakes were made, odds are only one of the two is wrong and the other offers a clue what was really intended.

Note that this also implies that, sometimes, the code can help fix broken documentation.

Jaap Van der Velde on April 17, 2012 11:32 AM

A couple of ways that source code can lie:

1) Wrong code revision: The source you're looking at is from version 3.00, but you're actually running on version 3.14.159.This mostly happens when you used pre-compiled binaries instead of building the application/library yourself. Or because you Googled for the relevant source keyword instead of searching through your own local source files.

2) Wrong file(s): You *think* that you're looking at the right code, but it turns out that it's an unused copy or variant that isn't actually what's compiled into the library you're using. This often happens when you're dealing with code that's meant to be compiled for multiple targets, eg. Linux kernels and drivers. Other times it happens because a file was moved or replaced, and the old version was left lingering in the repository.

A Facebook User on April 17, 2012 2:25 PM

This seems to be inspired by your post:

http://blog.ezyang.com/2012/04/use-the-source-dont-read-it/

Maarten Van Schaik on April 17, 2012 2:35 PM

When I read this a couple of hours ago I agreed with it 100%, then I read about the publication of the original Prince Of Persia source code on GitHub and, although the snifter of Brandy was a tumbler of Bechrovka, sat down for a fine evening of reading through someone else's code!

Link: https://github.com/jmechner/Prince-of-Persia-Apple-II

Andrew Martin on April 17, 2012 3:46 PM

Reading the source code is not the solution to bad documentation, contributing to fixing the documentation is the proper thing to do.

Relying on source code and bypassing official documentation is asking for trouble. The source code may reveal undocumented features that are not guaranteed to be available in future versions of frameworks.

I've written a more expanded response on my blog (http://priscimon.com/blog/2012/04/17/what-the-source-code/).

Eddy Young on April 17, 2012 3:53 PM

Precisely - and that's more of a problem if the developer is not a native English speaking person. The comments he/she may write not bring out the actual meaning. On the other hand, the code is always there.
In a good IDE any badly formatted code can be looked in a a lot more meaningful way.

A Facebook User on April 17, 2012 8:56 PM

Doesn't Stack Overflow run on a closed source stack?

Andrew on April 18, 2012 12:13 AM

Best quote from someone in the trenches:

And most of the time, what inexperienced developers consider beautiful is superficial, and what they consider ugly, is battle-hardened production-ready code from master hackers.

Jeffrey Davis on April 18, 2012 7:45 AM

Ole Eichhorn: "This is one of the reasons I don't like abstraction layers. You can't predict exactly what they're going to do, and you [mostly] can't read the source code to find out."

I have the EXACT SAME issue with projects that use inversion-of-control. Makes it IMPOSSIBLE to figure out the flow of the code unless you're using a debugger!

I also hate the current "minification" trend in web dev, because it creates source that's completely indecipherable. Yes, I can go grab the un-minified version from the original site, but what if a specific website has made their own changes?

Code should not need documentation external to the code itself to be understandable. Fin.

The Assimilator on April 19, 2012 2:10 AM

I only partially agree with your statement. Reading the source is important, but reading the source is time-consuming.

When you have time contstraints, having code (especially third party libraries) where documentation and code are coherent in behaviour, and especially where all available functions (i.e. public API) are well documented, that's a real plus and speaks for the quality of the "supplier".
When you have API and code that don't match, a load of undocumented features and you go on by (even obvious) trial and error, that's not good because you always have in the back of your mind the idea that the block/component/library you're using is not production ready and can become a thorn in your side.

Then, of course, reading a WELL WRITTEN and WELL DOCUMENTED source is a way to improve oneself, as much as reading a book in a foreign language you want to master.

I agree with the final statement from Zmaster: "Good documentation is a must, don't oversee it."

Luciano on April 19, 2012 8:52 AM

@Jeff

Didn't you develop StackOverflow on the Microsoft stack? I remember a post from way back justifying the decision.

How are you holding these "Use the source Luke" and using Microsoft stack ideas simultaneously?

Or have you changed your opinion on using the MS stack?

Dehaul on April 19, 2012 9:09 AM

@Dehaul
With tools like reflector, you can look into the .NET framework fairly easily.

@Jeff
Your related posts link doesn't work with middle click. IT simply opens the page in the same tab instead of a new one. Do me a favor, read the source and fix it. It's annoying. I'd do it myself, but you'd be angry :)

Samuel Warren on April 23, 2012 10:25 AM

I had a friend who always said: "Moses brought the source code down from the mountain, not the documentation."

Nick Hodges on April 23, 2012 11:10 AM

@Jeff - You're channelling Adele Goldberg - one argument she's well known for is her opposition to teaching people to write code before they learned to read it - her observation was that nobody would ever teach someone to write prose without having them read some first - and that in turn became a core philosphy of SmallTalk - all the code for the entire system was visible, and if you wanted examples or ideas, they were there for the taking, as well as the ability to see exactly how things were done

A Facebook User on April 23, 2012 4:31 PM

This blog is wonderful, thank you propose. Glad to see new projects! I wish you much success .
Voyance

lora on April 26, 2012 4:07 AM

Shameless plug but with the honest intent to help out those who find code reading difficult.

We are starting a free code reading study group. Initially it will be ruby, but we have plans for js.

We're just getting started and currently discussing our first project to explore. So far 150 people have signed up so it should make for some good discussion and study.

If you are interested please checkout

https://tinyletter.com/codereading

This is non commercial and in totally in the spirit of helping out each other.

Codereading on April 30, 2012 9:01 AM

Shame on me - where were my manners!

I forgot to say thank you for an excellent blog post. It's partly what spurred me to start the study group.

And good luck in your next project after stack!

Codereading on April 30, 2012 9:03 AM

@Jaap Van der Velde: "(Good) documentation is a description of what the software designer/programmer wanted to write"

Not exactly true. Documentation is a description of what the developer wanted to write *when they wrote the documentation*. However, the probability that the documentation will skew from revised intentions approaches 1 as the code becomes more mature. The code is the only truth about what a given routine actually does as opposed to what the documentation claims it does. It is the (current) users that dictate what it ought to be doing.

TBH, what is being discussed here are black boxes. If you drop a component into your solution and it works as expected, then the source of the component isn't necessary. The problem comes when the black box doesn't do what you expect and you have to figure out why. If the documentation helps to get over your problem, then great but at the end of the day, what really will help get past the problem is the source of the black box. That will truly tell you how the component is actually working.

Thomas on May 1, 2012 10:03 PM

Well said, but considering that your main platform is Windows it's also ironic.

László Monda on May 5, 2012 1:29 PM

Using application is different than developing application. When you're using most application for its main operate, it's a well used direction. Others have experienced the issues and enough individuals have verbal up to immediate the main members to appropriate the problem. But when you're developing application, you're doing something new.

School for English

Ayaz Ahmad on May 16, 2012 5:08 AM

Good read. Also another point is that all the really good product companies rely on you reading code for you training! Which is really awesome. Afterall well written code is the best form of documentation there is to a developer!

Ram Bhat on May 18, 2012 9:53 AM

Nice read, just what I was looking for, thanks!

www.seczine.com

Trevorkennedy on June 11, 2012 5:32 PM

Readability is always been top item in my code review list, Some time people compromised readability with sake of optimization and performance, which in most of case doesn't yield anything.

java67 on October 7, 2012 11:13 PM

The comments to this entry are closed.