June 20, 2006
The unofficial wikipedia blog entry The Future of Open Source Five Years Ago makes some fascinating comparisons between the adoption rate of Linux and the adoption rate of Wikipedia.
Server-side Linux is still a powerful force, but what happened to the desktop utopia that was supposed to unseat Windows? And will the same developed-world disenchantment hit Wikipedia as it grows?
While in principle anyone can contribute to an open source project, Linux's barriers to entry are higher than Wikipedia's. Even correcting minor Linux bugs is well beyond my expertise, but my grandmother could edit Wikipedia. All you need is an internet connection and literacy (mid-level literacy, at that; other people will fix your grammar and spelling).
Wikipedia can draw on half a billion potential contributors; only about 100,000 people can code Linux.
It's hard to overstate this difference.
To illustrate this point, it's accompanied by an amusing graph.
I've resized the graph to a more manageable size. Click through to see it full size. The writing on the bars is hard to make out. On the left, it's..
Number of programmers who can contribute to Linux, including minor bugfixes.
And on the right?
Number of people in the world who speak English well enough to edit Wikipedia and who have access to the internet. (This includes non-native speakers. It omits potential contributors to non English-language wikipedias only because this bar is already absurdly tall. (Okay, and because wikipedias of seperate languages are, to some degree, islands, and don't function as a cohesive whole.))
It's probably not fair to compare programming and writing in this manner, but I am reminded again of Joel Spolsky's classic advice for computer science college students:
The difference between a tolerable programmer and a great programmer is not how many programming languages they know, and it's not whether they prefer Python or Java. It's whether they can communicate their ideas. By persuading other people, they get leverage. By writing clear comments and technical specs, they let other programmers understand their code, which means other programmers can use and work with their code instead of rewriting it. Absent this, their code is worthless. By writing clear technical documentation for end users, they allow people to figure out what their code is supposed to do, which is the only way those users can see the value in their code. There's a lot of wonderful, useful code buried on sourceforge somewhere that nobody uses because it was created by programmers who don't write very well (or don't write at all), and so nobody knows what they've done and their brilliant code languishes.
Programming, like all writing, is just another form of communication. Writing code that the compiler understands is easy. Writing code that other people understand is far more difficult. And that's assuming you're persuasive enough to convince other people that your code, in a world positively overflowing with free code, is worth looking at in the first place.
Good luck. You're gonna need it.
Over the next few years, Wikipedia (and some of its Wikimedia sister sites) will become comparable to Linux in economic and social significance. Maybe Linux will catch up again a few decades later, if schools start teaching as many kids to program as they teach to write.
This quote was originally written in August 2005. I'd argue that Wikipedia was already more significant to the average internet user than Linux at that time. Now there's no question.
Technical programming skills are certainly important. But general writing and communication skills are far, far more important. Even if you're merely a humble programmer.
Posted by Jeff Atwood
Actually, I call this "job security" :))
BTW, I noted 4 spelling mistakes in the text on the long bar.
Even if you're merely a humble programmer.
A _humble_ programmer... Truly that is the rarest of breeds. ;)
From what I've seen the problem is not so often that programmers want the communication left to "professional communicators", like the Technical Communications majors. Rather, it's that we tend to feel its a non-technical task, and therefore beneath them. Too often the attitude is that "anyone can write specs, (or project proposals, or what-have-you), they pay me to do the _hard_ part".
But anyway, I do agree. As a programmer who actually enjoyed his English classes throughout high school and college (yeah, I know, I'm a freak of nature), I was always dismayed to hear my classmates in college dismiss their English classes as 1) completely uninteresting, and 2) completely unimportant.
Of course at the same time, I was okay with it. Because I knew that at my job interviews I could brag that I had better-than-average writing skills.
Hmm I'd be surprised if there are only 100k programmers that are capable of dealing with the linux codebase, given that  indicates that there are nearly 400k programmers working in the US alone.
OTOH both of these projects IMHO don't need all those ppl working, only enough to ensure that they grow in an adequate manner.
You've got to be a ~good~ programmer to code for the Linux kernel... You have to understand how pointers work in C, deal with concurrency, and live without a safety net. Your average Perl, PHP, Java or VB programmer needs to learn some new skills to become a kernel hacker. I believe the 100,000 number.
As for wikipedia, my experience is that the wikimedia software is hard to use compared to other wikis. It's scaled for big wikis that have big wiki problems, and it's a bad choice for small wikis that face an uphill battle getting contributions.
In the long term I think you'll see it getting harder to contribute to wikipedia. I've worked on community sites where the problem is keeping wreckers out, and generally it helps to force people to spend some effort "buying in" to the community in order to contribute. If you just post something to slashdot, you'll find your post gets buried, and probably never gets seen by a moderator. If, on the other hand, you meta-moderate regularly, you'll find that moderators start seeing your posts. If you write good stuff you get a positive karma and become a part of the community.
In the long term the person who comes in and changes one thing on wikipedia has to be marginalized -- you only really want to get contributions from people who jump through all the right hoops consistently. If you take any paramter that characterizes visitors (how often they visit, how often they contribute) and get a bell-curve like distribution, I can tell you that trouble is on the edges of the curve. If you're not looking for trouble, look at just the area 1 or 2 SD from the mean and you'll see less of it.
I noted 4 spelling mistakes in the text on the long bar.
How funny. I looked again, and you're right. I must have automagically fixed those errors when transcribing the text on the bar without even thinking about it.
For some reason, sideways vertical text is really, really hard to read, though.. ;)
Hi, this is ben from Wikipediablog.
Paul: Wikipedia has a good record of changing the parameters of that curve* by drawing people into editing who otherwise would have been annoying semiparticipants (read: adding nonsense to pages) -- mostly WPedia does this by having an army of real people at the ready, who send vandals actual person-to-person messages.
Autofiltration can be helpful, but I think it sometimes undercuts the power of undultrated social interaction. Almost everything on wikipedia is done by hand, not because nobody's noticed what could potentially be automated, but because automating anything could have unforseen effects and nobody wants to mess with the formula.
* Or maybe they just change the height of it.
Great timing on this article - I talked a little about this topic on my blog http://softarc.blogspot.com/
just a few days back
"Why documentation matters - intent and abstraction"
In the end it's about adding to the company's bottom line - communication in the short-term - in particular writing good technical specs, good readable code and good docs saves time and effort in the long run and reduces risk (e.g. of some bolthead changing the "wrong" code)
Factory writes "I'd be surprised if there are only 100k programmers that are capable of dealing with the linux codebase, given that  indicates that there are nearly 400k programmers working in the US alone."
I would not be surprised at the 100K number - not every programmer has the skill set to contribute in the short term. I use Java pretty much all day long, and I write Cocoa code in Objective C at night. I am a fair hand at Perl, and I can use Python or Ruby with a bit of time to work up. There are really very few parts of the Linux corpus that I can work on and produce a quality result. (Without, as I said, some workup time.)
If you want to really contribute to Linux, then C, X11, and a good grasp of make/autoconf become very useful. Sure, I could learn the above; I used to spend a lot of time in C++, and they are not _that_ different. That said, if you need more than a week to work up the needed skills to contribute meaningfully, then you are not really in the pool that can help make changes right now.
I was referring most specifically to idesktop/i linux -- like wikipedia, it's something people would interact with directly.
Now cross it with programmers who can code for the Linux kernel and still are pretty good at English. What number do you get? 5000? :-)
Which means that most programs are created by people who don't necessarily have good English skills, though the folks that do have good English skills can help to propagate the programs much more than the poor souls that really created the programs.
Let's be honest: We aren't all around great; we usually can be great at one thing or another. Some lucky ones can be great at hardcore programming, English, interface designing... But they are kind of rare and probably are not so great anyway.
Linux is a low quality operating system, and this isn't changing anytime soon. I'll stick with real Unix and will take wikipedia over linux any day of the week thank you :)
I'm always reminding of this comment I once read.
"Anyone idiot can write code that a computer can read: the trick is writing code that a person can read"
After all, in a lot of companies, programmers spend huge amounts of time reading code, and figuring out what to do next. I've always suspected that the upfront effort of enforcing style-guides and commenting-rules would pay off enormously if organisations were willing to put them into force.
I've been coding for my own business needs in a variety of languages for about 15 years now. I'm not a linux guy, I only "do" FreeBSD servers these days, but I occasionally bang on linux as a favor to other people. I have never contributed to any major software other than bug reports and simple workaround fixes, but I have been using comments in my own coding more and more as time goes on even though it is highly unkikely that another individual will ever see it. On several occasions I have haquered the kernel source to remove annoyances or add support for some hardware for which that OS version never had a driver or patch. Every so often I have a reason to dive into some of my old (2 years or more) code to fix a bug or add some functionality. More often than not these days I find myself thinking "WTF was I doing when I wrote that?" because I can't immediately see what purpose a specific block of code had (or it looks primitive because I have better techniques now ;) and I have to jump back and forth all over the place to put it into context. A big waste of time overall. Commenting obviously has value to me even if nobody ever sees a line of my code.
I'm sure wikipedia is very valuable to a lot of people, but I rarely look at anything on it. I research a LOT on the web and more often than not I am looking for knowledge in depth on a specific topic. If I get a google hit on wikipedia it is either pretty generic info which I already know, incorrect or misinformation or only has a tertiary reference to my info target. I have nothing against it, but hasn't been useful to me.
As for the topic question of which is more important and the useless but amusing bar graph, the blogs and wikis would not exist without the programming, kernel or otherwise. The programming is more important because it enables the knowledge to be widely shared. Without a method to communicate information, that information is basically useless to anyone else.
Before anyone (non-programmers?) gets their undies all twisted up, we're talking about computers and networking, not publishing books so please keep it in that context. In this case I think the egg clearly comes before the chicken. If the topic is ONLY linux and wikipedia, the title should reflect that and not be so generic.
"I'd argue that Wikipedia was already more significant to the average internet user than Linux at that time. Now there's no question."
And yet, Wikipedia is running Linux. Without Linux, Wikipedia might not exist. (Would they have started it if they had to shell out tens of thousands of dollars in license fees every year?)
And how do most people get there? From Google ... which runs ... well, you get the picture. (Would Google have a million cheap PCs if they had to pay license fees...)
You can claim that Wikipedia is more *visible* to most people than Linux-the-kernel, but Wikipedia usage is just a subset of Linux usage. We just don't think about it at the time, like how we don't think about internal combustion engines when we drive to visit grandma.
The observer's point of view generally describe the value / significance of the thing in question. The fact that half (more?) of the web sites use one form or linux or other would make me think that Linux is more significant than wikipedia. And therein lies the fallacy.
I would argue that I myself have found one more significant than other depending on whether I wanted to find out about "Papua New Guinea" or wanted to host my pictures for the latest camping trip.
Great post, as usual Jeff. I know from my experience in education that many many students decide to try and join the F.O.S.S community to contribute code to Linux, most give up pretty quickly.
There is a massive barrier to entry that consists of a few big problems.
1) There are huge differences between the type of code that is taught in university, and code that runs in programs such as open office, firefox etc.
2) Many universities don't bother teaching things like Source Control, automated building etc. These are fundamental if you want to work on any moderate size project
3) If you get past 1) and 2) you still have the E.S.Rs of the world to shoot you and your code down, with woeful criticism and sometimes outright ignorance.
At least that sums up my experiences with it. The majority of students who attempt to get involved end up working on their own project (which is still good).
btw: Why does the URL have to be www.codinghorror.com/blog, and not just codinghorror.com/blog?
Why does the URL have to be www.codinghorror.com/blog, and not just codinghorror.com/blog
No reason, I suppose, but I have to pick one or the other and I've already chosen "with prefix". You can destroy a lot of your pagerank with multiple names on the same content..