I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


28 posts from October 2005

October 31, 2005

Search: If It Isn't Incremental, It's Excremental

After I discovered the CTRL+I incremental search function in Visual Studio, I never used the standard find dialog again. Incremental search is so good that it makes traditional search dialogs completely obsolete. If you think that's hyperbole, consider that Chris Sells calls incremental search "pure sex".

This particular find dialog is from Notepad, but it's the basically the same find dialog that appears in every Windows application:

Standard search in Microsoft Notepad

The delimited search dialog has a lot of problems:

  • It's a dialog. A dialog right smack dab in the middle of the text, potentially obscuring what you're searching for. In some apps it's even modal!
  • It provides very little feedback. There's no indication whether your search term matches anything until you type the complete search term and press return or click Find.
  • It's an all-or-nothing operation. Once you initiate a search, you're committed until the search completes. If you just mistyped a search term in a 5 megabyte text file, have fun waiting for that to complete.
  • It forces you to think about directionality. If your cursor happens to be near the bottom of the file, you may not find any matches even though some exist at the top of the file. And you get nagged with yet another "no matches found" dialog.

Jef Raskin, in his book The Human Interface, has some choice words for the delimited search dialog:

This traditional method is rather punishing to the user, although most computer aficionados have become so accustomed to it that they no longer feel the pain.

Now compare that Notepad search dialog with the incremental search in Firefox:

Incremental search in Firefox

The advantages of incremental search are numerous:

  • There aren't any dialogs in your way. The search interface is blissfully dialog-less. There's nothing getting in the way of you and your search results. You can search, find, and resume working with minimal interruption. This is arguably handled even better in Visual Studio, where the interactive search indicator is a cursor change after you press CTRL+I.
  • It wastes less of your time. The search begins as soon as the first character is typed. You know immediately when you've got a good enough match and you can stop typing.
  • Mistakes are clearly evident. If you mistype something, you'll know that immediately, too. Press backspace to correct the typo and you're back to the previous match.
  • It's interactive. Immediate search feedback alows you to adjust your search strategy in real time. The net result is far better searches than you'd ever get from a traditional OK-then-try-again dialog box cycle.

After you've worked with incremental search for a few hours, you'll probably wonder why incremental search isn't included as a standard feature in every single Windows application. As Jef Raskin notes:

From the point of view of interface engineering, the advantages of incremental searches are so numerous and the advantages of delimited search so few that I can see almost no occasions when a delimited search would be preferred.

Jef also adds an amusing footnote to that sentence: If it isn't incremental, it's excremental. Amen, brother. I can barely stand to use editors without an incremental search mode any more. I'm so glad that (after a little prodding on my part) the latest betas of EditPad Pro 6 include this feature.

Posted by Jeff Atwood    24 Comments

October 30, 2005

Improving the Clipboard

In this era of 3ghz processors, 1gb memory, and 500gb hard drives, why is the Windows clipboard only capable of holding a single item? Sure, you have fancy multi-level undo and redo in applications like Microsoft Word and Visual Studio. Did you know that the humble Windows textbox supports a surprisingly deep undo/redo queue via the CTRL+Z (undo) and CTRL+Y (redo) keys?

But not the clipboard. It holds exactly one item. Copy another item to the clipboard and your previous clipboard item is irrevocably lost.

The clipboard is a model of simplicity. And that's admirable. But I think it's too simple. Adding a basic FIFO queue of clipboard items wouldn't affect typical usage -- but it would provide much richer functionality for intermediate and advanced users. Here's one such clipboard utility that I use, clcl. This lightweight utility launches when I press the ALT+C key, and presents a straightforward menu of recent clipboard items:

screenshot of CLCL

Of course, CTRL+C and CTRL+V still work as you would expect. I can't even tell you how many times I've been editing code in Visual Studio and accidentally overwritten the code I copied to the clipboard. Now I don't have to worry; I can just press ALT+C and then use the arrow keys or the number to select the clipboard item I want to paste. The clipboard is a heck of a lot more useful to me when I don't have to constantly worry about losing the one item on it. Clcl even persists the clipboard items to disk so they survive a reboot.

I liked clcl's simplicity, but there are dozens of similar clipboard utilities. To me, that's is a sign that better clipboard functionality really should be built into the operating system. Unfortunately, I can't find any reference to clipboard improvements in Vista. It'd be a darn shame if we're stuck with the archaic single item clipboard for another five years.

Posted by Jeff Atwood    13 Comments

October 28, 2005

Avoiding Booleans

Brad Abrams recently posted another great excerpt from the unfortunately named .NET Framework Standard Library Annotated Reference Volume 2:

Avoid creating methods with Boolean parameters. Boolean parameters make calls harder to read and harder to write.

Indeed. What is the difference between..

Authorization("foo", true)
Authorization("foo", false)

Who knows? I've certainly made this mistake before. The SLAR recommends ditching the boolean in favor of an enumeration:

Authorization("foo", AuthorizationCompletion.Pending)
Authorization("foo", AuthorizationCompletion.Finished)

Voil. Self-documenting code. If you're not careful, boolean parameters become magic numbers.

Avoiding boolean parameters isn't a new idea, of course; similar advice is dispensed by C++ guru Herb Sutter in this 2002 C++ User's Journal article. What you may not realize, however, is that it's also a good idea to avoid booleans in your user interface. Jef Raskin explains in his book, The Humane Interface:

Check boxes can leave the user guessing what the alternative is. For example, if a check box labeled "Save to archive on closing" is checked, the data will be saved to an archive when the window is closed, but the label gives little clue as to what will happen if the box is not checked. Will the data be saved somewhere else, not saved at all, or will another option appear when you close the window? Often, the best solution is to use a set of radio buttons; they are not modal, and the user can clearly see not only the current state but also the alternative(s). Whether checkboxes or radio buttons are used, it is important to label with adjectives which describe the state of the affected object. If verbs are used as labels, the user does not know whether the action has taken place or is yet to take place.

For one-of-many choices, radio buttons are already the standard, and there is rarely any reason to use other mechanisms. Whenever possible, use radio buttons instead of checkboxes. Checkboxes work reliably only when the value of the state controlled by the check is immediately visible or in short-term memory.

As a developer my go-to boolean UI element is the checkbox. If it can be true or false, it's a checkbox, right? Like so:

A typical boolean checkbox

But what does the verb "Lock" mean? This checkbox violates the Don't Make Me Think rule. Now watch what happens when we change to adjectives and radio buttons:

Boolean checkbox converted to radiobutton enumeration

This is conceptually identical to the code sample; we simply switched from a boolean to an enumeration. It's amazing how obvious the benefits are in retrospect, but it sure wasn't obvious to me. Until today.

Posted by Jeff Atwood    11 Comments

October 27, 2005

Copying Visual Studio Code Snippets to the Clipboard as HTML

As I mentioned in Formatting HTML code snippets with Ten Ton Wrecking Balls, copying code to your clipboard in Visual Studio is often an excercise in futility if you want anything more than plain vanilla text. VS copies code to the clipboard with bizarro-world RTF formatting instead of the sane, simple HTML markup you might expect. This is true even of the brand spanking new VS.NET 2005.

I previously developed a macro that converted highlighted code to simple HTML on the clipboard using two different methods. I've since removed the Word interop method entirely because it's clunky. And I have improved the RTF-to-HTML conversion method substantially. Take this code, for example:

Some sample C# code to copy to the clipboard

Let's highlight the code execute the FormatToHtml.Modern macro, and then paste the contents of the clipboard into something like FreeTextBox:

namespace TotallyUnnecessaryNamespace
{
    
/// <summary>
    /// I heart GUIDs
    /// </summary>
    public class MyClass
    {
        
public void test()
        {
            
string s = "test";
            
int i = 1234;
        }
    }
}

That's extra clean, well-formatted <span> colored HTML wrapped in a simple <div>. It preserves the color scheme and indentation from your IDE exactly*, although it does substitute a standard monospace IDE font. View source on this post to see the raw markup.

Now compare this with the craptacular results you'll get when you do a traditional copy and paste! This is how VS.NET 2005's CTRL+C copy functionality should behave. You could even map the CTRL+C shortcut to the macro if you like.

My favorite new feature, however, is that the macro now dynamically removes excessive indenting from copied code. That makes it a lot cleaner when copying code snippets from the TotallyUnnecessaryNamespace namespace. As Cartman would say, super sweet. And it works in Visual Studio 2002, 2003, and 2005. Try it yourself!

Download the FormatToHtml macro (5kb) Updated 4/2006

Here's how to get started with this macro

  1. go to Tools - Macros - IDE
  2. create a new Module named "FormatToHtml" under "MyMacros"
  3. paste the downloaded code into the module
  4. add references to System.Drawing and System.Web via the Add Reference menu
  5. save and close the macro IDE window
  6. go to Tools - Macros - Macro Explorer
  7. Four new macros will be under "FormatToHtml"; double-click to run the macro, then paste away..

* Background colors are lost, but that's because the RTF markup VS.NET places in the clipboard doesn't contain the background colors, either. Total bummer.

Posted by Jeff Atwood    42 Comments

October 26, 2005

Google search VS.NET macro

Here's a handy little Visual Studio .NET macro which searches for the currently highlighted term in Google. The search is launched as a new tab within the IDE when you press

Alt+F1

I know what you're thinking: you've seen this macro before. Yeah, but this one goes to eleven. It actually works with any highlighted text in the IDE -- including highlighted text from the Output window:

google_search_macro_screenshot.png

Here's the macro code (updated 11/26/2007*):

Public Sub SearchGoogleForSelectedText()
    Dim s As String = ActiveWindowSelection().Trim()
    If s.Length > 0 Then
        DTE.ItemOperations.Navigate("http://www.google.com/search?q=" & _
            Web.HttpUtility.UrlEncode(s))
    End If
End Sub

Private Function ActiveWindowSelection() As String
  If DTE.ActiveWindow.ObjectKind = EnvDTE.Constants.vsWindowKindOutput Then
    Return OutputWindowSelection()
  End If
  If DTE.ActiveWindow.ObjectKind = "{57312C73-6202-49E9-B1E1-40EA1A6DC1F6}" Then
    Return HTMLEditorSelection()
  End If
  Return SelectionText(DTE.ActiveWindow.Selection)
End Function

Private Function HTMLEditorSelection() As String
  Dim hw As HTMLWindow = ActiveDocument.ActiveWindow.Object
  Dim tw As TextWindow = hw.CurrentTabObject
  Return SelectionText(tw.Selection)
End Function

Private Function OutputWindowSelection() As String
    Dim w As Window = DTE.Windows.Item(EnvDTE.Constants.vsWindowKindOutput)
    Dim ow As OutputWindow = w.Object
    Dim owp As OutputWindowPane = ow.OutputWindowPanes.Item(ow.ActivePane.Name)
    Return SelectionText(owp.TextDocument.Selection)
End Function

Private Function SelectionText(ByVal sel As EnvDTE.TextSelection) As String
    If sel Is Nothing Then
        Return ""
    End If
    If sel.Text.Length = 0 Then
        SelectWord(sel)
    End If
    If sel.Text.Length <= 2 Then
        Return ""
    End If
    Return sel.Text
End Function

Private Sub SelectWord(ByVal sel As EnvDTE.TextSelection)
    Dim leftPos As Integer
    Dim line As Integer
    Dim pt As EnvDTE.EditPoint = sel.ActivePoint.CreateEditPoint()

    sel.WordLeft(True, 1)
    line = sel.TextRanges.Item(1).StartPoint.Line
    leftPos = sel.TextRanges.Item(1).StartPoint.LineCharOffset
    pt.MoveToLineAndOffset(line, leftPos)
    sel.MoveToPoint(pt)
    sel.WordRight(True, 1)
End Sub

I tested the macro in VS.NET 2003 and VS.NET 2005 and it works great with no modifications in either environment. Here's how to install it:

  1. go to Tools - Macros - IDE
  2. create a new Module with a name of your choice under "MyMacros". Or use an existing module.
  3. paste the above code into the module
  4. add a reference to the System.Web namespace (for HttpUtility) to the module
  5. close the macro IDE window
  6. go to Tools - Options - Environment - Keyboard
  7. type "google" in the Show Commands Containing textbox. The SearchGoogleForSelectedText macro should show up
  8. click in the Press Shortcut Keys textbox, then press ALT+F1
  9. click the Assign button
  10. click OK

It's really quite handy; ALT+F1 is a totally natural chord and a logical superset of F1.

* Courtesy Bojan Bjelic, the macro now works in .aspx source (html) view.

Posted by Jeff Atwood    39 Comments

October 25, 2005

The Cognitive Style of Visual Studio

Charles Petzold is widely known as the guy who put the h in hWnd. He's the author of the seminal 1988 book Programming Windows, now in its fifth edition. And he can prove it, too. He has an honest-to-God Windows tattoo on his arm:

Charles Petzold and his Windows tattoo

This is explained in his FAQ:

Q. Is that a real tattoo?

A. I think of it more as a scar I got after doing Windows programming for ten years (beginning in 1985).

When Charles Petzold talks, with my apologies to E.F. Hutton, people listen. Charles recently spoke at the NYC .NET Developer's Group and asked, Does Visual Studio Rot the Mind?

It's a great essay. The central idea is that your skillset should not be dictated by the tools you use. I've covered similar ground in Programming for Luddites, so I don't necessarily disagree. But I also wonder if Petzold has fallen into the trap Dan Appleman outlines in RAD is not productivity:

The reason that so much bad VB6 code was written was not because VB6 was RAD, but because it was easy. In fact, VB6 made writing software so easy that anyone could be a programmer, and so everyone was. Doctors, Lawyers, Bankers, Hobbyists, Kids -- everyone was writing VB6 code with little or no training.

Now, I don't know about you, but I still have copies of a few of the programs I wrote when I was just starting out, before I'd actually gone to school to learn a thing or two about software development. There was some BASIC, some Pascal, and looking at it now, it's all pretty ugly.

So let's get real. Bad programmers write bad code. Good programmers write good code. RAD lets bad programmers write bad code faster. RAD does NOT cause good programmers to suddenly start writing bad code. RAD tools can make a good programmer more productive, because they speed up the coding process without compromising the level of quality that a good programmer is going to achieve.

Petzold's essay meanders a bit, but ultimately cuts a little deeper than "R.A.D. is B.A.D.":

Life without Visual Studio is unimaginable, and yet, no less than PowerPoint, Visual Studio causes us to do our jobs in various predefined ways, and I, for one, would be much happier if Visual Studio did much less than what it does. Certain features in Visual Studio are supposed to make us more productive, and yet for me, they seem to denigrate and degrade the programming experience.

Petzold argues that the cognitive model that Visual Studio forces on the developer is fundamentally flawed. This is essentially the same argument presented in Edward Tufte's 2003 essay, The Cognitive Style of PowerPoint.* Petzold goes on to illustrate with intellisense, which he has a love/hate relationship with:

But the implication here is staggering. To get IntelliSense to work right, not only must you code in a bottom-up structure, but within each method or property, you must also write you code linearly from beginning to end -- just as if you were using that old DOS line editor, EDLIN. You must define all variables before you use them. No more skipping around in your code. It's not that IntelliSense is teaching us to program like a machine; it's just that IntelliSense would be much happier if we did.

And then there's the issue of code generation:

Even if Visual Studio generated immaculate code, there would still be a problem. As Visual Studio is generating code, it is also erecting walls between that code and the programmer. Visual Studio is implying that this is the only way you can write a modern Windows or web program because there are certain aspects of modern programming that only it knows about. And Visual Studio adds to this impression by including boilerplate code that contains stuff that has never really been adequately discussed in the tutorials or documentation that Microsoft provides.

It becomes imperative to me, as a teacher of Windows Forms programming and Avalon programming, to deliberately go in the opposite direction. I feel I need to demystify what Visual Studio is doing and demonstrate how you can develop these applications by writing your own code, and even, if you want, compiling this code on the command line totally outside of Visual Studio.

In my Windows Forms books, I tell the reader not to choose Windows Application when starting a new Windows Forms project, but to choose the Empty Project option instead. The Empty Project doesn't create anything except a project file. All references and all code has to be explicitly added.

Am I performing a service by showing programmers how to write code in a way that is diametrically opposed to the features built into the tool that they're using? I don't know. Maybe this is wrong, but I can't see any alternative.

In other words, a developer weaned on the Visual Studio .NET IDE is powerless outside that enviroment. Working in the Visual Studio IDE becomes synonymous with the very act of programming. And that's the thing Petzold is most afraid of:

[to solve a New Scientist math puzzle] I decided to use plain old ANSI C, and to edit the source code in Notepad -- which has no IntelliSense and no sense of any other kind -- and to compile on the command line using both the Microsoft C compiler and the Gnu C compiler.

What's appealing about this project is that I don't have to look anything up. I've been coding in C for 20 years. It was my favorite language before C# came along. This stuff is just pure algorithmic coding with simple text output. It's all content. Even after this preliminary process, there's still coding to do, but there's no APIs, there's no classes, there's no properties, there's no forms, there's no controls, there's no event handlers, and there's definitely no Visual Studio.

It's just me and the code, and for awhile, I feel like a real programmer again.

Using Notepad to code may be an instructive exercise in minimalism for students, but no professional programmer can afford to build software this way. If anything, I think the future lies in even tighter coupling of the language and the IDE. I can even envision a day where it isn't possible to compile a program outside the IDE-- and that's probably heresy to Petzold.

But it's also the future.

* Tufte's essay is also available in parody powerpoint form.

Posted by Jeff Atwood    33 Comments

October 24, 2005

The Cost of Leaving Your PC On

Between my server and my Windows Media Center home theater PC, I have at least two PCs on all the time at home. Have you ever wondered how much it's costing you to leave a computer on 24 hours a day, 7 days a week?

The first thing you need to know is how much power your computer draws. The best way is to measure the actual power consumption. You'll need a $30 device like the Kill-a-Watt to do this accurately. Once you get one, you'll inevitably go through a phase where you run around your home, measuring the power draw of everything you can plug into a wall socket. For example, I learned this weekend that our 42" plasma television draws between 90 watts (totally black screen) and 270 watts (totally white screen). Based on a little ad-hoc channel surfing with an eye on the Kill-a-Watt's LCD display, the average appears to be around 150 watts for a typical television show or movie.

But I digress. Once you've measured the power draw in watts (or guesstimated the power draw), you'll need to convert that to kilowatt-hours. Here's the kilowatt-hour calculation for my server, which draws ~160 watts:

160 watts * (8,760 hours per year) / 1000 = 1401.6 kilowatt-hours

The other thing you'll need to know is how much you're paying for power in your area. Power here in California is rather expensive and calculated using a byzantine rate structure. According to this recent Mercury News article, the household average for our area is 14.28 cents per kilowatt-hour.

1401.6 kilowatt-hours * 14.28 cents / 100 = $200.15

So leaving my server on is costing me $200 / year, or $16.68 per month. My home theater PC is a bit more frugal at 65 watts. Using the same formulas, that costs me $81 / year or $6.75 per month.

So, how can you reduce the power draw of the PCs you leave on 24/7?

  • Configure the hard drives to sleep on inactivity. You can do this via Control Panel, Power, and it's particularly helpful if you have multiple drives in a machine. My server has four hard drives, and they're typically asleep at any given time. That saves a solid 4-5 watts per drive.
  • Upgrade to a more efficient power supply. A certain percentage of the input power to your PC is lost as waste during the conversion from wall power to something the PC can use. At typical power loads (~90w), the average power supply efficiency is a disappointing 65%. But the good news is that there's been a lot of recent vendor activity around more efficient power supplies. The Fortron Zen fanless power supply, for example, offers an astonishing 83% efficiency at 90w load! If you upgraded your power supply, you could theoretically drop from 122w @ 65% efficiency to 105w @ 83% efficiency. That's only a savings of $20 per year in this 90w case, but the larger the power usage, the bigger the percentage savings.
  • Don't use a high-end video card. I'm not sure this is widely understood now, but after the CPU, the video card is by far the biggest power consumer in a typical PC. It's not uncommon for the typical "mid-range" video card to suck down 20+ watts at idle -- and far more under actual use or gameplay! The worrying number, though, is the idle one. Pay close attention to the video card you use in an "always-on" machine.
  • Configure the monitor to sleep on inactivity. This one's kind of a no-brainer, but worth mentioning. A CRT eats about 80 watts, and a LCD of equivalent size less than half that.
  • Disconnect peripherals you don't use. Have a server with a CD-ROM you rarely use? Disconnect the power to it. A sound card you don't use? Pull it out. Redundant fans? Disconnect them. That's only a savings of a few watts, but it all adds up.

If you're building a new PC, it's also smart to avoid Intel's Pentium 4 series, as they use substantially more power than their AMD equivalents. Intel's Pentium-M, on the other hand, delivers the best bang for the watt on the market. Although it was originally designed for laptops, it can be retrofitted into desktops.

Posted by Jeff Atwood    130 Comments

October 23, 2005

Excluding Matches With Regular Expressions

Here's an interesting regex problem:

I seem to have stumbled upon a puzzle that evidently is not new, but for which no (simple) solution has yet been found. I am trying to find a way to exclude an entire word from a regular expression search. The regular expression should find and return everything EXCEPT the text string in the search expression.

For example, if the word fox was what I wanted to exclude, and the searched text was:

The quick brown fox jumped over the lazy dog.

... and I used a regular expression of [^"fox"] (which I know is incorrect) (why this doesn't work I don't understand; it would make life SO much easier), then the returned search results would be:

The quick brown jumped over the lazy dog.

Regular expressions are great at matching. It's easy to formulate a regex using what you want to match. Stating a regex in terms of what you don't want to match is a bit harder.

One easy way to exclude text from a match is negative lookbehind:

\w+\b(?<!\bfox)

But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:

(?!fox\b)\b\w+

You can test this regex in the cool online JavaScript Regex evaluator. Unfortunately, JavaScript doesn't support negative lookbehind, so if you want to test that one, I recommend RegexBuddy. It's not free, but it's the best regex tool out there by far-- and it keeps getting better with every incremental release.

Posted by Jeff Atwood    11 Comments

October 22, 2005

It looks like you're writing a for loop!

Even the best programmers make shitty software, with bugs. But some programmers are naturally proficient at creating this special kind of software, as illustrated by a Croatian developer known as Stinky:

The anecdote that best reveals how little Stinky knew about programming started when he asked Bojan to help him solve the following problem:
"I have a function that returns Boolean value. Well, I would like to call that function and store the opposite value in some variable. I could code it like this: If function = true Then variable = false Else variable = true. But I have a feeling that it can be even simpler than that. Can you tell me how?"
After Bojan recovered from the shock of realizing that Stinky didn't know the basics of logical algebra, he replied that it was enough to put variable = Not function. Stinky went to check it out and after few minutes he cheerfully shouted: "It works!". Bartol was a witness to the whole scene. A few moments later he said to Bojan, "You see, my friend, to hell with education, your degree in computer science and the tons of books you read. You don't need any of that to be a champion developer like Stinky. Just learn the copy-paste method, remember all the properties of Janus Gridex and ActiveBar control, and the world is yours."

Sure, Microsoft's Office Assistant, Clippy, gets a lot of flak-- but wouldn't it be nice if Clippy could assist Stinky with his code?

Clippy in Visual Studio .NET

Or, if the IDE could detect this kind of code as it's being typed and offer some helpful advice* to Stinky?

The 'shitty code' exception

VS.NET 2005 also offers just-in-time intellisense for exceptions, which can be customized by editing the underlying XML. The intellisense for a NullReferenceException isn't very helpful for a developer like Stinky; here's one way K. Scott Allen improved it:

custom_vsnet_error.png

* Despite extensive use of my google-fu, I couldn't find the original source of this image. If anyone knows who originally created it, let me know so I can attribute it properly.

Posted by Jeff Atwood    7 Comments

October 21, 2005

The Nigerian Spammer Anthem

A recent Los Angeles Times article reveals that the 419 scam spammers have their very own anthem: a song titled I Go Chop Your Dollars by Nigerian recording artist Osofia:

"419 is just a game, you are the losers, we are the winners.
White people are greedy, I can say they are greedy
White men, I will eat your dollars, will take your money and disappear.
419 is just a game, we are the masters, you are the losers."

We may joke about the 419 scams.. after all, who in their right mind actually falls for this stuff? But like all spammers, they do it because it works:

[Samuel] sent 500 e-mails a day and usually received about seven replies. Shepherd would then take over. "When you get a reply, it's 70% sure that you'll get the money," Samuel said.

Spam only became a problem for me about a year and a half ago, but clearly it's here to stay. I've used POPFile for about a year to cut down on my email spam**. Some people swear by challenge-response human verification systems such as SpamArrest, but as Scott Mitchell notes, this system has some issues:

While the challenge/response system was effective in reducing my spam intake from about 100 messages a day to around 1 or 2 messages a day, the approach, in my estimation, was not ideal. One big disadvantage was that fewer people took the time to respond to the challenge email than I had anticipated, for two reasons:
  1. Some people don't want to take the time to follow instructions for a challenge email. Maybe their message wasn't that important after all, maybe they're busy, or maybe they just don't like being told what to do. These people's messages, I reckoned, weren't that vital. If you can't take two seconds to respond to the challenge, then just how important is that email you're sending me?
  2. What worried me most, and led me to suspend my C/R anti-spam system, is that I noticed some people weren't responding to the challenge email because they never received it! This unfortunate circumstance could happen if their own spam blocking solution halted my challenge email. A couple folks informed me that Outlook 2003 categorized my challenge emails as spam. Others using a similar challenge/response anti-spam system would never get my challenge as my challenge would generate a challenge on their side.

The "I challenge your challenge!" scenario is particularly amusing. And on top of the two issues Scott highlights, there are other social problems with challenge/response spam blocking.

Although I've had great success with POPFile, which uses Bayesian filtering techniques, I had no idea that there's an even better technique: Markovian filtering. That's what the CRM114 Discriminator* uses. There's an outstanding slide deck (pdf) that explains how it all works. In a nutshell, Markovian filtering weights phrases and words, whereas Bayesian filtering only looks at individual words. How much better is it? I'll let the CRM114 author, Bill Yerazunis, pitch it:

For the month of April 2005, I receieved over 10,000 emails. About 60% were spam. I had ZERO classification errors. ZERO.

As of Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy

I measured my own accuracy to be around 99.84%, by classifying the same set of about 3000 messages twice over a period of about a week, reading each message from the top until I feel "confident" of the message status, (one message per screen unless I want more than one screen to decide on a message.) and doing the classification in small batches with plenty of breaks and other office tasks to avoid fatigue. Then I diff()ed the two passes to generate a result. Assuming I never duplicate the same mistake, I, as an unassisted human, under nearly optimal conditions, am 99.84% accurate.

Most Bayesian techniques top out at around ~98% percent accuracy with a little training, but Markovian can achieve a rarified 99.5% accuracy. The most notable Windows port of CRM114 is SpamRIP.

* A reference to the movie Dr. Strangelove. In the movie, the "CRM114 Discriminator" is a fictional accessory for a radio receiver that's "designed not to receive at all", that is, unless the message is properly authenticated.

** I have since switched to K9 because it's simpler and faster-- and does the same Bayesian filtering.

Posted by Jeff Atwood    6 Comments
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.