I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


22 posts from April 2007

April 30, 2007

An Initiate of the Bayesian Conspiracy

An Intuitive Explanation of Bayesian Reasoning is an extraordinary piece on Bayes' theorem that starts with this simple puzzle:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

This simple puzzle is not all that simple in practice. Only 15% of doctors, when presented with this situation, come up with the correct answer.

Can you come up with the correct answer -- without resorting to Google, the comments to this post, or reading the answer provided in the article?

If so, congratulations. You're a natural initiate of the Bayesian Conspiracy. For the rest of us, Bayes' Theorem is a bit more difficult to grasp:

While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract. Bayesian reasoning is very counterintuitive. People do not employ Bayesian reasoning intuitively, find it very difficult to learn Bayesian reasoning when tutored, and rapidly forget Bayesian methods once the tutoring is over. This holds equally true for novice students and highly trained professionals in a field. Bayesian reasoning is apparently one of those things which, like quantum mechanics or the Wason Selection Test, is inherently difficult for humans to grasp with our built-in mental faculties.

In computer science, it's easy to demonstrate the immense power of Bayes' theorem: it's the basis for almost all spam filters in use today. Bayesian email filtering was first publicized by Paul Graham's A Plan for Spam in mid-2002. Most programmers know about Bayesian filtering now; it's the primary weapon in any modern Spam fighting toolkit.

What you may not know, however, is that there's something even more effective than Bayesian spam filtering. It's eloquently described in William Yerazunis' presentation The Spam Filtering Plateau at 99.9% Accuracy and How to Get Past It (also available in pdf paper form). And it's been implemented as the CRM114 Discriminator for years. That technique is Markovian spam filtering:

How to change a Bayesian spam filter to a Markovian spam filter:
  1. Change the feature generator from single words to spanning multiple words
  2. Change the weighting so that longer features have more weight (ie, longer features generate local probabilities closer to 0.0 and 1.0)
  3. The 2^2n weighting means that the weights are 1, 4, 16, 64, 256, ... for span lengths of 1, 2, 3, 4, 5 ... words

In other words, where Bayesian filters examine the relationship between individual words, Markovian filters expand the scope to examine the relationship between words and phrases. It's a tweak, but a significant one that amplifies the accuracy of the already uncannily accurate Bayes' theorem.

But the true power of Bayes' theorem extends far beyond merely discriminating between spam and non-spam. As the CR114 documentation notes, you can use these powerful statistical models to discriminate between.. well, just about anything:

Spam is the big target with CRM114, but it's not a specialized Email-only tool. CRM114 has been used to sort web pages, resumes, blog entries, log files, and lots of other things. Accuracy can be as high as 99.9 %. In other words, CRM114 learns, and it learns fast.

Now perhaps you can understand why some people are so excited about Bayes' theorem.

Maybe you see Bayes' theorem, and you understand the theorem, and you can use the theorem, but you can't understand why your friends and/or research colleagues seem to think it's the secret of the universe. Maybe your friends are all wearing Bayes' theorem T-shirts, and you're feeling left out. Maybe you're a girl looking for a boyfriend, but the boy you're interested in refuses to date anyone who "isn't Bayesian". What matters is that Bayes is cool, and if you don't know Bayes, you aren't cool.

Why does a mathematical concept generate this strange enthusiasm in its students? What is the so-called Bayesian Revolution now sweeping through the sciences, which claims to subsume even the experimental method itself as a special case? What is the secret that the adherents of Bayes know? What is the light that they have seen?

It's not intuitive for most people, but look a little more closely, and I think you, too, will become an initiate of the Bayesian conspiracy.

Posted by Jeff Atwood    91 Comments

April 29, 2007

See You At MIX07

I'm heading off to MIX07 today.

MIX07 logo

MIX is by far my favorite Microsoft conference, because it "mixes" in a liberal dose of traditionally non-Microsoft folks for a broader range of perspectives. It's probably the only Microsoft conference I'll be attending this year.

Vertigo is also presenting something special at MIX: our new Family.Show WPF reference app.

Family.Show WPF reference app screenshot

If you're attending MIX this year and you're interested in meeting up, shoot me an email. I'll definitely bring lots of stickers.

I also set up a Coding Horror Twitter stream for MIX related activities, and I'll try to keep it updated throughout the conference, barring any performance meltdowns -- for example, right now Twitter's static asset server appears to be down, so no images or stylesheets appear.

Posted by Jeff Atwood    10 Comments

April 26, 2007

JavaScript and HTML: Forgiveness by Default

I've been troubleshooting a bit of JavaScript lately, so I've enabled script debugging in IE7. Whenever the browser encounters a JavaScript error on a web page, instead of the default, unobtrusive little status bar notification..

default JavaScript status bar error notification

.. I now get one of these glaring, modal error debug notification dialogs:

Javascript debugging error dialog in IE7

I left this setting enabled out of pure forgetfulness. Browsing the web this way, I quickly realized that the web is full of JavaScript errors. You can barely click through three links before encountering a JavaScript error of one kind or another. Often they come in pairs, triplets, sometimes dozens of them. It's nearly impossible to navigate the web with JavaScript error notification enabled.

JavaScript errors are so pervasive, in fact, that it's easy to understand why IE demotes them to nearly invisible statusbar elements. If they didn't, nobody would be able to browse the web without getting notified to death. Firefox goes even further: there's no visible UI whatsoever for any JavaScript errors on the current web page. You have to open the Tools | Error Console dialog to see them.

The upshot of this is that JavaScript errors, unless they result in obvious functional problems, tend to go unnoticed. Things that would cause showstopping compiler errors in any other language are at worst minor inconveniences in JavaScript. When errors are ignored by default, what you end up with is an incredibly tolerant, extremely permissive programming ecosystem. If it works, it works, errors be damned.

But this unparallelled flexibility has its price. Just ask Dave Murdock, who found out the hard way how flexible JavaScript can be.

So I dug into the code, which I hadn't written, and I saw JavaScript similar to this in the execution path that was causing Firefox to hang:

var startIndex = 0;

for (i = startIndex; i < endIndex; i++) {
  // do some stuff here
}

This works fine in Internet Explorer 7. What happens in Firefox? i is reinitialized to startIndex after every run of the loop. You have to declare the loop like this for it to work:

var startIndex = 0;

for (var i = startIndex; i < endIndex; i++) {
  // do some stuff here
}

Putting the var before i is the way it ought to be as far as I can tell, but both Internet Explorer and Firefox do the wrong thing by developers here. Both browsers should be sticklers about requiring var in a loop variable declaration and produce a clear JavaScript interpreter error before the code has the chance to run.

It's not just JavaScript. HTML and CSS are incredibly forgiving of errors as well. Ned Batchelder observed bizarrely tolerant behavor when specifying named colors that don't exist. Consider this small snippet of HTML:

<font color='red'>&#x2588; This is RED</font>

As you vary the named color, you don't get the error you might expect. What you do get is weird colors:

Firefox IE7 Opera
red â–ˆ #ff0000 â–ˆ #ff0000 â–ˆ #ff0000
seagreen â–ˆ #2e8b57 â–ˆ #2e8b57 â–ˆ #2e8b57
sea green â–ˆ #0e00ee â–ˆ #0e00ee â–ˆ #0ea00e
sxbxxsreen â–ˆ #0000e0 â–ˆ #0000e0 â–ˆ #00b000
sxbxxsree â–ˆ #00000e â–ˆ #0b00ee â–ˆ #00b000
sxbxxsrn â–ˆ #000000 â–ˆ #0b0000 â–ˆ #00b000
sxbxeen â–ˆ #000e00 â–ˆ #0bee00 â–ˆ #00b0ee
sreen â–ˆ #00ee00 â–ˆ #00ee00 â–ˆ #00ee00
ffff00 â–ˆ #ffff00 â–ˆ #ffff00 â–ˆ #ffff00
xf8000 â–ˆ #0f8000 â–ˆ #0f8000 â–ˆ #0f8000

(If you're curious how "sea green" can possibly equate to blue, the answers are in the comments to Ned's post.)

I can't think of any other programming environment that goes to such lengths to avoid presenting error messages, that tries so hard to make broken code work, at least a little. Although there was a push to tighten up HTML into the much more strictly enforced XHTML, it's an utter failure. If you're not convinced, read Mark Pilgrim's thought experiment:

Imagine that you posted a long rant about how [strict XHTML validation] is the way the world should work, that clients should be the gatekeepers of wellformedness, and strictly reject any invalid XML that comes their way. You click ‘Publish', you double-check that your page validates, and you merrily close your laptop and get on with your life.

A few hours later, you start getting email from your readers that your site is broken. Some of them are nice enough to include a URL, others simply scream at you incoherently and tell you that you suck. (This part of the thought experiment should not be terribly difficult to imagine either, for anyone who has ever dealt with end-user bug reports.) You test the page, and lo and behold, they are correct: the page that you so happily and validly authored is now not well-formed, and it not showing up at all in any browser. You try validating the page with a third-party validator service, only to discover that it gives you an error message you've never seen before and that you don't understand.

Unfortunately, the Draconians won: when rendering as strict XHTML, any error in your page results in a page that not only doesn't render, but also presents a nasty error message to users.

XHTML strict rendering error

They may not have realized it at the time, but the Draconians inadvertently destroyed the future of XHTML with this single, irrevocable decision.

The lesson here, it seems to me, is that forgiveness by default is absolutely required for the kind of large-scale, worldwide adoption that the web enjoys.

The permissive, flexible tolerance designed into HTML and JavaScript is alien to programmers who grew up being regularly flagellated by their compiler for the tiniest of mistakes. Some of us were punished so much so that we actually started to like it. We point and laugh at the all the awful HTML and JavaScript on the web that barely functions. We scratch our heads and wonder why the browser can't give us the punishment we so richly deserve for our terrible, terrible mistakes.

Even though programmers have learned to like draconian strictness, forgiveness by default is what works. It's here to stay. We should learn to love our beautiful soup instead.

Posted by Jeff Atwood    77 Comments

April 25, 2007

Coding Horror on .NET Rocks

It was my great honor to participate in this week's epsiode of .NET Rocks!

.NET Rocks! is a long running internet radio talk show for software developers that goes all the way back to 2002. I've listened to their shows off and on for years. They've interviewed some very notable software developers along the way, including Steve McConnell, and many other people far more interesting than myself. One of the earliest interviews (#11, to be precise) was with our CEO, Scott Stanfield.

My interview is 64 minutes long, and explores some common themes I've covered here in my blog. It's available in the following formats:

For some reason, I had trouble opening these links by directly clicking, so you may want to right click and do a "save as". More download options are available on the interview page.

Thanks to Carl Franklin and Richard Campbell for a great interview!

Posted by Jeff Atwood    41 Comments

April 24, 2007

How Not To Write a Technical Book

If I told you to choose between two technical books, one by renowned Windows author Charles Petzold, and another by some guy you've probably never heard of, which one would you pick?

That's what I thought too. Until I sat down to read both of them. Take a look for yourself:

Charles Petzold's Applications = Code + Markup:

Petzold WPF book, sample page 2   Petzold WPF book, sample page 2

Adam Nathan's Windows Presentation Foundation Unleashed:

Nathan WPF book, sample page 1   Nathan WPF book, sample page 2

Beyond the obvious benefit of full color printing, which adds another dimension to any text, it's not even close. The Nathan book is the clear winner:

  • It's full of diagrams, screenshots, and illustrations showing the meaning of the code.
  • The text is frequently broken up by helpful color-coded sidebars such as "digging deeper", "FAQ", and "warning".
  • The code/markup snippets are smaller and easier to digest; they don't dominate page upon page of the text.
  • Liberal use of bullets, tables, subheadings, and other textual elements provides excellent scannability.
  • The book has a sense of humor without being obnoxious or cloying.
  • Did I mention it's in color?

The Nathan book is brilliant. It reads like a blog and competes toe-to-toe with anything you'd find on the web. Petzold's book, in contrast, is a greyscale sea of endless text and interminable code. There are so few diagrams in the book that you get a little thrill every time you encounter one. It also artificially segregates code and markup: the first half is all C# code; it's not until the second half that you see any XAML markup whatsoever, even though XAML is one of the most important new features of WPF, and the one developers will be least familiar with.

I suppose this sort of old-school treatment is typical Petzold. What do you expect from a guy who thinks Visual Studio rots the minds of software developers? The difference in approach is immediately obvious to anyone who opens both books. One looks compelling, fun, and inviting; the other looks like a painful, textbook slog that's the equivalent of writing code in Notepad. Petzold's an excellent writer, but writing alone can't make up for the massive layout deficiencies of his book.

It's too bad, because I loved Petzold's earlier book Code, which was a love letter to the personal computer filled with wonderful illustrations. As much as I respect Petzold, you should avoid his WPF book. Get the Nathan book instead-- you'll love it. Publishers, take note: I'd sure be buying a heck of a lot more technical books if more of them were like this one.

Posted by Jeff Atwood    100 Comments

April 23, 2007

Where Are All the Open Source Billionaires?

Hugh MacLeod asks, if open source is so great, where are all the open source billionaires?

If Open Source software is free, then why bother spending money on Microsoft Partner stuff? I already know what Microsoft's detractors will say: "There's no reason whatsoever. $40 billion per year is totally wasted."

This, however is not a very satisfying answer, simply because it doesn't quite ring true. Otherwise there'd be a lot more famous Open Source billionaires out there, being written up in Forbes Magazine or wherever. And Bill Gates would've been ousted years ago.

I can immediately think of one reason there aren't any open-source billionaires:

Linux Distro timeline, 1991-2007

Most competition for open source software comes from other open source software. It's far more cutthroat than the commercial software market could ever be.

Rajesh Setty responded to Hugh's question with a few additional reasons why it's difficult for open source businesses to make money:

If open source is license free, the costs have to be low to work with open source. If cost is one of the reasons for a customer to embrace open source, he or she will pay less than what they would have paid to a comparable enterprise software to do the same job. An open source company would have to therefore work twice as hard to a comparable enterprise software company to make the same or less amount of money. This means that they have to have a lot more resources than the competing enterprise software company. How can you have a smaller pie but feed a lot more people and still keep everyone happy?

But I think MacLeod is asking the wrong question, so Setty's answers, although well reasoned, are irrelevant. There probably won't ever be any open source billionaires. Just ask JBoss founder Marc Fleury:

To do [open source software] seriously, professionally, in a sustainable fashion you need to make a living. What is clearly compromised is the "instant billionaire" club. I remember the first time I saw Torvalds on a panel and someone asked "why isn't there an open source billionaire", and I immediately thought "because you are distributing FREE SOFTWARE, dummy." And there still isn't an open source billionaire today. There are very few billionaires period. Your average MSFT developer certainly isn't one.

I for one don't believe there will ever be an open source billionaires club. There are and will be many multi-millionaires though. If we execute on our plan without screwing up, we will create a large batch of OS millionaires. We care about the developers and people who create real value in companies getting rewarded.

The lack of open source software billionaires is by design. It's part of the intent of open source software -- to balance the scales by devaluing the obscene profit margins that exist in the commercial software business. Duplicating software is about as close to legally printing money as a company can get; profit margins regularly exceed 80 percent.

To ask where the open source billionaires are is to demonstrate a profound misunderstanding of how open source software works. If you wanted to become obscenely rich by starting an open source software company, I'm sorry, but you picked the wrong industry. You'll make a living, perhaps even a lucrative one. But you won't become Bill Gates rich, or Paul Allen rich, by siphoning away the exorbitant profit margins commercial software vendors have enjoyed for so many years.

But there is a silver lining.

There are real millionaires-- even billionaires-- who built companies on open source software. Just ask Larry Page and Sergey Brin. Or the YouTube founders. The real money isn't in the software. It's in the service you build with that software.

Posted by Jeff Atwood    113 Comments

April 20, 2007

Welcome to Dot-Com Bubble 2.0

The dot-com bubble was a watershed event for software developers. You simply couldn't work in the field without having something miraculous or catastrophic happen to you. Or both at once.

The "dot-com bubble" was a speculative bubble covering roughly 1995–2001 during which stock markets in Western nations saw their value increase rapidly from growth in the new Internet sector and related fields. The period was marked by the founding (and in many cases, spectacular failure) of a group of new Internet-based companies commonly referred to as dot-coms. A combination of rapidly increasing stock prices, individual speculation in stocks, and widely available venture capital created an exuberant environment in which many of these businesses dismissed standard business models, focusing on increasing market share at the expense of the bottom line. The bursting of the dot-com bubble marked the beginning of a relatively mild yet rather lengthy early 2000s recession in the developed world.

Like many others, I saw warning signs all over the place in late 2000:

  • Skyrocketing salaries resulted in a rash of neophytes entering the software development field with giant dollar signs in their eyes.
  • Internet companies with irrational, unsustainible business strategies built to cash in and hiring at a frenetic pace.
  • You were never more than two degrees of separation away from a tale of some programmer who became an overnight millionaire.

Despite all the warning signs, it never occurred to me that I was working in a bubble. Until it popped. I don't want to make that mistake again. The three years after the bubble burst were dark, dark times for software developers. Everyone had to scramble to find a place to weather the worst of the storm. And the backlash was severe: rampant offshoring, devaluation of the IT industry as a whole, and diminished salaries and opportunies for everyone.

NASDAQ graph, 1997-2007

Seven years later, we're now clearly in the throes of another dot-com bubble. You might argue that the new bubble has been in effect since mid-2006, but the signs are absolutely unmistakable now. The job market for software developers is every bit as hyper-competitive as it was in 1999. The idea that you can found a company on the internet-- and make money-- is taken seriously now. There's a new one every week.

We've had seven long years to think about what the dot-com bubble meant, and where things went wrong. Here's what I think the original bubble got wrong, and what's different in today's bubble:

  1. Most people have an always-on broadband connection to the internet. Broadband penetration was a mere 5 percent in 2000; as of early 2007 it's now over 50 percent. So many dot-com business models were predicated on the mass market of dialup users, conveniently forgetting how brutally painful it was to use the internet on a modem.
  2. The emergence of viable ad networks. Few dot-com companies had revenue models that made any sense. Now there are dozens of potential advertising networks that you can plop on a web page to guarantee income proportionate to the pageviews. This advertising-supported model pioneered on the web is even trickling over into desktop applications.
  3. Moore's law and open source. An internet startup can now scale to thousands of concurrent users on a few cheap, commodity server boxes, running proven open-source solutions like Linux and MySQL. All of this was possible in 2000, but the "whitebox" software and hardware was unproven, and tended to be far behind the expensive, proprietary solutions. Now it's assumed, mature, a known quantity-- and the cost for that hardware and software is precipitously close to $0.

But the original bubble wasn't all greed and stupidity-- I recommend reading through Paul Graham's What the Bubble Got Right for the upside.

This new bubble does appear to be a bit more sane than the last one, at least initially. The greasy odor of get-rich-quick isn't quite as overpowering as it was in 1999. So far, people seem more interested in building sustainible, useful businesses than rapid market capitalizations.

Bubbles are exciting times. Fortunes are made and lost; careers built and destroyed. It's great while it lasts. So here's my question to you: what will you do differently in this bubble?

Posted by Jeff Atwood    85 Comments

April 19, 2007

Apparently Bloggers Aren't Journalists

I ran across this blog entry while researching Microsoft's new Silverlight Flash competitor. It makes some disturbing complaints about the limitations of Silverlight, in bold all-caps to boot:

This is where I threw my hands up in disgust. What in the holy name of Scooby-Doo are those people thinking?!?! After poring through the [Silverlight] API, I thought "I must be mistaken. Surely this is a mistake." But then I asked a colleague and he confirmed it for me. Let me skip a couple lines and highlight this so you all can see it clearly.

WPF/E (Silverlight) HAS NO SUPPORT FOR BINDING TO MODELS, BINDING TO DATA, OR EVEN CONNECTING TO NETWORK RESOURCES TO OBTAIN DATA.

So, I will summarize Microsoft's efforts to date around Silverlight. They have created a declarative programming model that uses XAML as an instantiation language for rich 2D (not 3D) content and animations, as well as extended JavaScript to support this model. Using this model, you can create embedded mini-apps that have access to rich animations, graphics, audio, and video objects. However, these mini applications cannot communicate with the outside world, they cannot consume web services, and they cannot bind UI elements to data. In addition, this model doesn't even have support for things that should be considered a stock part of any library such as buttons, checkboxes, list boxes, list views, grids, etc.

Those are serious problems indeed. I found this blog entry because it's referenced by another blog entry on the limitations of Silverlight:

But what are the capabilities of Silverlight itself? I came across this blog entry of someone who has downloaded the SDK, read the documentation, and looked at the code. Microsoft seems to be waiting for the Orcas release cycle before adding data binding, controls, and .Net runtime support to Silverlight - and Orcas could be delayed until 2008.

But before I clicked through to that blog entry, I started by reading this blog post on the limitations of Silverlight:

Although I just found this post about it which points out that [Silverlight] has a lot of pretty major shortcomings.

The idea that Microsoft's new Flash-alike can't even download data via HTTP seemed impossibly wrong to me. Couldn't be. Can't be. Like any large company, Microsoft certainly makes their share of dumb mistakes. But an epic mistake like that stretches the bounds of credibility even for Microsoft.

In short, I didn't believe it. So I downloaded the Silverlight SDK to take a look for myself. Guess what I found in the Silverlight SDK documentation, not five minutes after downloading it?

The Downloader object is a special-purpose WPF/E object that provides the ability to download content, such as XAML content, JavaScript content, or media assets, such as images. By using the Downloader object you do not have to provide all application content when the WPF/E control is instantiated. Rather, you can download content on demand in response to application needs. The Downloader object provides functionality for initiating the data transfer, monitoring the progress of the data transfer, and retrieving the downloaded content.

The properties and methods of the Downloader object are modeled after the XMLHttpRequest (XHR) set of APIs. XMLHttpRequest provides JavaScript and other web browser scripting languages the ability to transfer and manipulate XML data to and from a web server using HTTP.

I'm not out to defend Silverlight here.

It's clear that blogger A posted completely erroneous information; I'm not sure how he could have missed the obviously named and prominently featured Downloader object in the SDK. It really calls into question whether or not he actually used the SDK at all. But let's assume, for the moment, that he did, and it was a simple oversight on his part. The strident tone of his post makes me think otherwise, but let's give him the benefit of the doubt.

The real problem is that this erroneous information was echoed by blogger B, and then echoed again by blogger C. At no point did anyone stop to actually verify the claims of blogger A, even in the most rudimentary, basic of ways. All they had to do was download the SDK and look for themselves to confirm that his complaints were true. I'm talking five minutes, maximum.

But they didn't.

Instead, they blindly parroted blogger A, assumed that all of his claims were valid, and perpetuated his mistake across the internet.

Let's compare that behavior with the Society of Professional Journalists Code of Ethics, which includes the following guidelines:

  • Test the accuracy of information from all sources and exercise care to avoid inadvertent error. Deliberate distortion is never permissible.
  • Diligently seek out subjects of news stories to give them the opportunity to respond to allegations of wrongdoing.
  • Identify sources whenever feasible. The public is entitled to as much information as possible on sources' reliability.

I realize that it's unrealistic to hold every blogger on planet Earth to the same standards as professionally trained journalists. Bloggers, after all, aren't professionals.

But I do believe blog readers have a right to expect that amateur bloggers will:

  1. Do their homework before writing.
  2. Do some basic investigation of other bloggers' claims before linking to their posts or quoting them.

None of these bloggers did any of the above. Don't let their mistakes delude you into thinking this is typical or acceptable behavior. It isn't. We may not be professional journalists-- but we are still accountable for the words we write. It pains me that I even have to say this in 2007, but don't assume everything you read on the internet is true. Check the facts yourself. Putting in that extra bit of effort won't transform you into a journalist, but I can guarantee it'll make you a better blogger.

Posted by Jeff Atwood    65 Comments

April 18, 2007

Sins of Software Security

I picked up a free copy of 19 Deadly Sins of Software Security at a conference last year. I didn't expect the book to be good because it was a free giveaway item from one of the the vendor booths. But I paged through it on the flight home, and I was pleasantly surprised. It's actually quite good.

19 Deadly Sins of Software Security

Software security isn't exactly my favorite topic, so holding my interest is no mean feat. It helps that the book is mercifully brief and to the point, and filled with practical examples and citations. It's an excellent cross-platform, language-agnostic checksheet of common software security risks.

Here's a brief summary of each of the 19 sins, along with a count of the number of vulnerabilities I found in the Common Vulnerabilities and Exposures database for each one.

Affected Languages Exploit count
Buffer Overflows C, C++ A buffer overrun occurs when a program allows input to write beyond the end of the allocated buffer. Results in anything from a crash to the attacker gaining complete control of the operating system. Many famous exploits are based on buffer overflows, such as the Morris worm. 3,326
Format String Problems C, C++ The standard format string libraries in C/C++ include some potentially dangerous commands (particularly %n). If you allow untrusted user input to pass through a format string, this can result in anything from arbitrary code execution to spoofing user output. 411
Integer Overflows C, C++, others Failure to range check on integer types. This can cause integer overflow crashes and logic errors. In C/C++, integer overflows can be turned into a buffer overrun and arbitrary code execution, but all languages are prone to denial of service and logic errors. 288
SQL Injection All Forming SQL statements with untrusted user input means users can "inject" their own commands into your SQL statements. This puts your data at risk, and can even lead to complete server and network compromise. 2,225
Command Injection All Occurs when untrusted user input is passed to a compiler or interpreter, or worse, a command line shell. Potential risk depends on the context. 193
Failing to Handle Errors Most A broad category of problems related to a program's error handling strategy; anything that leads to the program crashing, aborting, or restarting is potentially a denial of service issue and therefore can be a security problem, particularly on servers. 80
Cross-Site Scripting (XSS) Any web-facing A web application takes some input from the user, fails to validate it, and echoes that input directly back to the web page. Because this code is running in the context of your web site, it can do anything your website could do, including retrieving cookies, modifying the HTML DOM, and so forth. 2,996
Failing to Protect Network Traffic All Most programmers understimate the risk of transmitting data over the network, even if that data is not private. Attackers can eavesdrop, replay, spoof, tamper with, or otherwise hijack any unprotected data sent over the wire. 26
Use of Magic URLs and Hidden Form Fields Any web-facing Passing sensitive or secure information via the URL querystring or hidden HTML form fields, sometimes with lousy or ineffectual "encryption" schemes. Attackers can use these fields to hijack or manipulate a browser session. 33
Improper use of SSL and TLS All Using most SSL and TLS APIs requires writing a lot of error-prone code. If programmers aren't careful, they will have an illusion of security in place of the actual security promised by SSL. Attackers can use certificates from lax authorities, subtly invalid certificates, or stolen/revoked certificates, and it's up to the developer to write the code to check for that. 123
Use of Weak Password-Based Systems All Anywhere you are using passwords, you need to seriously consider the risks inherent to all password-based systems. Risks like phishing, social engineering, eavesdropping, keyloggers, brute force attacks, and so on. And then you have to worry about how users choose passwords, and where to store them securely on the server. Passwords are a necessary evil, but tread carefully. 1,235
Failing to Store and Protect Data Securely All Information spends more time stored on disk than in transit. Consider filesystem permissions and encryption for any data you're storing. And try to avoid hardcoding "secrets" into your code or configuration files. 56
Information Leakage All The classic trade-off between giving the user helpful information, and preventing attackers from learning about the internal details of your system. Was the password invalid, or the username? 26
Improper File Access All 1) There is often a window of vulnerability between time of check and time of use (TOCTOU) in the filesystem, so an attacker can slip changes in, particularly if the files are accessed over the network.
2) The "it isn't really a file problem"; you may think you have a file, but attackers may substitute a link to another file, or a device name, or a pipe.
3) Allowing users control over the complete filename and path of files used by the program; this can lead to directory traversal attacks.
5, 58
Trusting Network Name Resolution All It's simple to override and subvert DNS on a server or workstation with a local HOSTS file. How do you really know you're talking to the real "secureserver.com" when you make a HTTP request? 20
Race Conditions All A race condition is when two different execution contexts are able to change a resource and interfere with each other. If attackers can force a race condition, they can execute a denial of service attack. Unfortunately, writing properly concurrent code is incredibly difficult. 139
Unauthenticated Key Exchange All Exchanging a private key without properly authenticating the entity/machine/service that you're exchanging the key with. To have a secure session, both parties need to agree on the identity of the opposing party. You'd be shocked how often this doesn't happen. 1
Cryptographically Strong Random Numbers All Imagine you're playing poker online. The computer shuffles and deals the cards. You get your cards, and then another program tells you what's in everybody else's hands. Random numbers are similarly fundamental to cryptography; they're used to generate things like keys and session identifiers. An attacker who can predict numbers-- even with only a slight probability of success-- can often leverage this information to breach the security of a system. 5
Poor Usability All Security is always extra complexity and pain for the user. It's up to us software developers to go out of our way to make it as painless as it can reasonably be. Security only works if the secure way also happens to be the easy way. All

It's true that C and C++ have a heavy cross to bear. But only 3 of the 19 sins can be completely lumped on the plate of K&R. The other 16 apply almost everywhere, to any developer writing code on any platform. It's a sobering thought.

The usability sin is the one that's most interesting to me. Usability is tough under the best of conditions-- and security is the worst of conditions, at least from the user's perspective. It's quite a challenge. There are a few great links in the book on the topic of security usability:

You can certainly find other books that go much deeper on particular aspects of software security. But if you're looking for an excellent primer on the entire gamut of security problems that could potentially afflict your project, 19 Deadly Sins of Software Security is an excellent starting point.

Posted by Jeff Atwood    28 Comments

April 17, 2007

When In Doubt, Make It Public

Marc Hedlund offered some unique advice to web entrepreneurs last month:

One of my favorite business model suggestions for [web] entrepreneurs is to find an old UNIX command that hasn't yet been implemented on the web, and fix that.

To illustrate, Marc provides a list of UNIX commands with their corresponding web implementations:

talk, fingerICQ
LISTSERVDejaNews
lsYahoo! directory
find, grepGoogle
rnBloglines
pineGoogle Mail
mountAmazon S3
bashYahoo! Pipes
wallTwitter

Jason Kottke noted that most successful "new" business models on the web aren't new at all-- they're simply taking what was once private and making it public and permanent:

Blogger = public email messages. (1999) Instead of "Dear Bob, Check out this movie." it's "Dear People I May or May Not Know Who Are Interested in Film Noir, check out this movie. If you like it, maybe we can be friends."

Flickr = public photo sharing. (2004) Flickr co-founder Caterina Fake said in a recent interview: "When we started the company, there were dozens of other photosharing companies such as Shutterfly, but on those sites there was no such thing as a public photograph -- it didn't even exist as a concept -- so the idea of something 'public' changed the whole idea of Flickr."

YouTube = public home videos. (2005) Bob Saget was onto something.

Twitter = public IM. (2006) I don't think it's any coincidence that one of the people responsible for Blogger is also responsible for Twitter.

But you don't have to found a new Web 2.0 company to benefit from the power of public information. Even brick and mortar companies are finally realizing that the age-old principle of "secret by default" may not be the best policy today:

Companies used to assume that details about their internal workings were valuable precisely because they were secret. If you were cagey about your plans, you had the upper hand; if you kept your next big idea to yourself, people couldn't steal it. Now, billion- dollar ideas come to CEOs who give them away; corporations that publicize their failings grow stronger. Power comes not from your Rolodex but from how many bloggers link to you - and everyone trembles before search engine rankings.

Power, it seems, comes from public information. Secrets are only a source of powerlessness. Just ask Brad Abrams, who poses this rhetorical question:

If no one knows you did X, did you really get all the benefits for doing X?

I think Brad is being a bit too cautious here. I'll go one step further. Until you've..

  • Written a blog entry about X
  • Posted Flickr photos of X
  • Uploaded a video of X to YouTube
  • Typed a Twitter message about X

.. did X really happen at all?

This is not to say we should fill the world with noise on every mundane aspect of our existence. But who decides what is mundane? Who decides what is interesting? Everything's interesting to someone, even if that someone is only you and a few other people in the world.

It's my firm belief that the inclusionists are winning. We live in a world of infinitely searchable micro-content, and every contribution, however small, enriches all of us. But more selfishly, if you're interested in deriving maximum benefit from your work, there's no substitute for making it public and findable. Obscurity sucks. But obscurity by choice is irrational. When in doubt, make it public.

Posted by Jeff Atwood    55 Comments
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.