October 31, 2007
The F5 Key Is Not a Build Process
Hacknot's If They Come, How Will They Build It? is a harrowing series of 29 emails sent over a two week period.
To: Mike Cooper
From: Ed Johnson
Mike,
I finally got CVS access today from Arnold. So I've checked out the AccountView module OK, but it won't compile. The Eclipse project has dependencies on about five other projects. I tried checking those dependent projects out as well, but a few of them won't build at all? How are you managing to develop this thing when the dependent projects don't build?
Ed
From: Mike Cooper To: Ed Johnson
Oh yeah - I forgot to tell you about the dependent projects. I always forget about them. I'm not so surprised some of them don't build for you. I've got versions on my machine that build OK but I haven't checked them in for a while. Gimme about 15 minutes and I'll check them in, then you should be right to go.
M.
It's a cautionary tale about a serious software project pathology: the pain of getting a new developer up and running on an existing software project. It's startlingly common.
This points us to one of the most important health metrics on a software development project. How long does it take for you to get a new team member working productively on your project? If the answer is more than one day, you have a problem. Specifically, you don't have a proper build process in place.
I've talked before about the importance of a build server as the heartbeat for your project. A sane software development project has automatic daily builds, performed on a neutral build server. If your team is in the habit of producing those kind of daily builds, it's difficult to accumulate the deep technical debt enumerated in all those emails. If the build server can do it, so can your newly hired coworkers.
But based on the development practices I've often seen on site with customers, I think setting up a build server might be an unrealistic goal, at least initially. It might not get done. We should shoot for a more modest goal to start with.
Here's how most clients I work with build a project:
- Open the IDE
- Load the solution
- Get latest
- Press F5 (or CTRL+SHIFT+B)
If your "build process" is the F5 key, you have a problem. If you think this sounds ridiculous-- who would possibly use their IDE as a substitute for a proper build process? -- then I humbly suggest that you haven't worked much in the mainstream corporate development world. The very idea of a build script outside the IDE is alien to most of these teams.
Get your build process out of the IDE and into a build script. That's the first step on the road to build enlightenment.
The value of a build script is manifold. Once you have a build script together, you've created a form of living documentation: here's how you build this crazy thing. And naturally this artifact is checked into source control, right alongside the files necessary to build it (and even the database necessary to run it, too). From there, you can begin to think about having that script run on a neutral build server to avoid the "Works On My Machine" syndrome. You can also consider all the nifty ways you could enhance the script with stuff like BATs, BVTs, and Functional Tests. Your build server can become the heartbeat of your project. There's no upper limit on how clever you can be, and how many different build scripts you can come up with. Build scripts can be incredibly powerful-- but you'll never know until you start using them.
The F5 key is not a build process. It's a quick and dirty substitute. If that's how you build your software, I regret that I have to be the one to tell you this, but your project is not based on solid software engineering practices.
So, if you don't have a build script on your project, what are you waiting for?
October 30, 2007
Embracing Languages Inside Languages
Martin Fowler loosely defines a fluent interface thusly: "The more the use of the API has that language like flow, the more fluent it is." If you detect a whiff of skepticism here, you're right: I've never seen this work. Computer languages aren't human languages.
Let's look at a concrete example from Joshua Flanagan. Here's how we define a regular expression in the standard way:
<div\s*class="game"\s*id="(?<gameID>\d+)-game"(?<content>.*?) <!--gameStatus\s*=\s*(?<gameState>\d+)-->
Here's how we'd define that same regular expression in Joshua's fluent interface.
Pattern findGamesPattern = Pattern.With.Literal(@"<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal(@"-game""")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal(@"<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
So we're replacing a nice, succinct one line regular expression with ten lines of objects, methods, and named enumerations. This is progress?
I'll grant you that I am probably unusually familiar with regular expressions, even by developer standards. There's a reason they have a reputation for being dense and inscrutable. I've definitely seen some incredibly bad regular expressions in my day. But in my professional opinion, that regex was a well written one. I had no problem reading it. Adding a ton of hyper-dense object wrappers to that regex makes it harder for me to understand what it does.
The new syntax Joshua invented is great, but it's specific to his implementation. Although it may seem like a good idea to use these kinds of training wheels to "learn" regular expressions, I'd argue that you aren't learning them at all. And that's a shame, because regular expression syntax is a mini-language of its own. Once you learn it, you can use it anywhere; it works (almost) the same in every environment.
The Subsonic project attempts to do something similar for SQL. Consider this SQL query:
SELECT * from Customers WHERE Country = "USA" ORDER BY CompanyName
Here's how we would express that same SQL query in SubSonic's fluent interface:
CustomerCollection c = new CustomerCollection(); c.Where(Customer.Columns.Country, "USA"); c.OrderByAsc(Customer.Columns.CompanyName); c.Load();
I've mentioned before that I'm no fan of object-oriented rendering when a simple string will suffice. That's exactly the reaction I had here; why in the world would I want to use four lines of code instead of one? This seems like a particularly egregious example. The SQL is harder to write and more difficult to understand when it's wrapped in all that proprietary SubSonic object noise. Furthermore, if you don't learn the underlying SQL-- and how databases work-- you're in serious trouble as a software developer.
But I can see the rationale behind these types of database code generation tools:
- They "solve" the object-relational mapping problem for you (and if you believe that, I have a bridge you might be interested in)
- you get intellisense
- your database is strongly typed
- the compiler now "understands" the database, or at least the generated classes that represent the database.
I definitely sympathize with the desire to produce less code, and that's the whole point of database code generation tools. Personally, I would argue that most of these benefits could be realized with smarter IDEs that actually understood native SQL strings (or regular expressions), rather than relying on a slew of generated code and complicated, proprietary object syntax.
But let's take a step back and think about what's really happening here. In both cases, we are embedding one language inside another. SQL is a language. Regular expressions are a language. Wrapping those languages inside a bunch of mega-verbose fluent interface ObjectJunk-- just so we can pretend we're writing code in our primary language-- is a total cop-out. Fluent interface object wrappers feel like a nasty hack to me.
Why can't we embrace the language-inside-a-language paradigm, rather than running and hiding from it? These domain specific languages exist because they are optimized for processing strings and data efficiently. Avoiding them is counterproductive.
Perhaps the ultimate solution is to redefine the underlying language to incorporate the features of another language.
Consider how Perl integrates the regular expression language:
while (my $line = <IN>) {
while ( $line =~ /(Romeo|Juliet|Mercutio|Tybalt|Friar \w+)/g ) {
my $character = $1;
++$counts{ $character };
}
}
Here's how C# 3.0, with LINQ, integrates the SQL language:
var c = from Customer in Customers where Customer.Country == "USA" orderby Customer.CompanyName select Customer;
Note the conspicuous lack of ObjectJunk. No explosion at the parens and periods factory. No MassivelyLongTextEnumerations to deal with. There's nothing but code that looks like exactly what it does. And that's a beautiful thing.
Embrace the idea of languages inside languages. In The Land of Strings, we speak regular expressions. In The Land of Data, we speak SQL. Oh sure, you can pretend those languages don't exist, and hide out in the Kingdom of Nouns-- but you're only cheating yourself out of a deeper understanding of how things really work in those other places. Fluent interface object wrappers may seem like a helpful convenience, but they're actually an ugly hack, and a terrible substitute for true language integration.
October 28, 2007
Your Desktop Is Not a Destination
I'm of two minds on the desktop.
If you're really using your computer, your desktop should almost never be visible. Your screen should be covered with information, with whatever data you're working on. I can't imagine why you'd willingly stare at a static background image-- or even a background image covered with a sea of icons. Unless you consider your computer a really expensive digital picture frame, I suppose.
The desktop background, as I see it, is completely superfluous. My desktop "background" right now is plain black. And that doesn't bother me in the least, because none of it is visible. I have browser windows and programs-- the things I'm actually doing -- covering all three monitors. When I'm using a computer, I make it my goal to never see the desktop background. Every time the desktop background is visible, that means I'm making poor use of my monitor pixels. Whenever the desktop background peeks through, I treat it like a reprimand.
I won't lie to you. I don't always achieve my goal. The desktop is sometimes visible when I'm working. But I do try my darndest to cover all my monitors with something useful, and a static desktop background just isn't useful.
That said, it is fun to have a unique desktop background. Even if you rarely see it. In the above official screenshots from Apple and Canonical, the desktop background images were picked quite intentionally. I've done this myself; when I put together those pictures of the monitor arms, I specifically chose an interesting desktop background to show it off.
Sometimes you just want to show off, even if it's only for yourself. When I graduated to my first triple monitor configuration, back in 2004, I used this 3200 x 1200 image of the entire first level of Super Mario brothers as my desktop background.
But I felt very, very dirty afterwards. I worry that if we spend too much time obsessing over our desktop backgrounds, we'll start treating our computers like fashion accessories instead of tools. We should be filling our screens with information, not distracting ourselves with pretty frippery.
However, if we do it responsibly, if we keep reminding ourselves that our desktop is not a destination, it's OK to obsess over our desktop backgrounds a little bit. The desktop is like an aesthetically pleasing airport we must occasionally pass through before arriving at our real destinations: a web browser, a word processor, an IDE, a graphics editor, etcetera. You know, the places we really want to go. A good-looking airport gives every traveller a positive feeling about where they're going, so feel free to spruce it up. Just don't go so far that you become one of those weird people who hangs out in airports.
In my original research, I ran across a lot of sites with great wallpaper resources. There's a heavy emphasis on extra-wide wallpapers here, as I run triple monitor configurations at home and at work. If you, too, rock a multi-mon setup under Windows, you'll need a utility to get different background images on each monitor, or to span a single image across all your monitors. I use Ultramon which does this and much more; Display Fusion does less, but it works for this, and it's free.
Personally, I don't care for photographs on my desktop. I prefer abstract backgrounds. This must be an unusual preference, because most desktop background websites are completely dominated by photographs. Still, I found a few sites with good abstract backgrounds, even though I had to sift through a lot of photographs to get to them.
- InterfaceLIFT
- Flickr wallpapers pool, wallpaper exchange pool, most interesting last 7 days
- Mandolux
- Digital Blasphemy
- eBoy
- Veer
- Panoramic Photography by Brad Templeton
- Library of Congress Panoramic Photographs
- Game Wallpapers new school, dual-screen
- Desktop Gaming old school
- Deviant Art multi-display
- SquidFingers repeating patterns
- Citrus Moon repeating patterns
- k10k pixel patterns
- Damask wallpaper patterns
- Dual, Triple, Quad monitor backgrounds
- Vlad Studios
- Pixeldecor repeating patterns
For abstract backgrounds, I had the best luck with Flickr and InterfaceLIFT.
If you spend the next hour searching for the perfect desktop background, don't blame me. I tried to warn you. I'm hoping you don't see that special desktop background of yours too often.
October 26, 2007
How To Achieve Ultimate Blog Success In One Easy Step
Always Be Jabbing. Always Be Shipping. Always Be Firing. It's the same advice, stated in different ways for different audiences.
My theory is that lead generation derives from Google rank and that the best way to increase Google rank is to be like a professional fighter: neither jabs nor haymakers are enough. You must be always jabbing and you must regularly throw haymakers. Blog continuously to keep your hit-rate and link-traffic high and write longer pieces, containing the high-value words associated with your niche, occasionally.
When people ask me for advice on blogging, I always respond with yet another form of the same advice: pick a schedule you can live with, and stick to it. Until you do that, none of the other advice I could give you will matter. I don't care if you suck at writing. I don't care if nobody reads your blog. I don't care if you have nothing interesting to say. If you can demonstrate a willingness to write, and a desire to keep continually improving your writing, you will eventually be successful.
But success takes time-- a lot of time. I'd say a year at minimum. That's the element that weeds out so many impatient people. I wrote this blog for a year in utter obscurity, but I kept at it because I enjoyed it. I made a commitment to myself, under the banner of personal development, and I planned to meet that goal. My schedule was six posts per week, and I kept jabbing, kept shipping, kept firing. Not every post was that great, but I invested a reasonable effort in each one. Every time I wrote, I got a little better at writing. Every time I wrote, I learned a little more about the topic, how to research topics effectively, where the best sources of information were. Every time I wrote, I was slightly more plugged in to the rich software development community all around me. Every time I wrote, I'd get a morsel of feedback or comments that I kept rolling up into future posts. Every time I wrote, I tried to write something just the tiniest bit better than I did last time.
The changes, to me, were almost imperceptible. But from a very modest start-- a 2004 new year's resolution for professional development -- I'd say writing this blog is now, without a doubt, the most important thing I've ever done in my entire career.
I won't say I got my job here at Vertigo back in 2005 because of this blog, but it was definitely a factor. I was interviewed on .NET rocks, and I've been interviewed online not once but twice. I've been invited to speak at conferences. I am approached for book deals every few months. I exchange email regularly with Steve McConnell, one of my programming idols as a young adult, and he once asked me for advice on blogging. Joel Spolsky actually recognized me and invited conversation when I attended the Emeryville leg of his world tour. Charles Petzold sent me, completely unprompted, a signed copy of his latest book. People offer to send me incredibly cool free swag on a regular basis.
As near as I can tell, between RSS stats and log stats, around 100,000 people read this blog every day. Ad revenues that I've only reluctantly taken are significant enough now that I've actually entertained the idea, in my weaker moments, of becoming a full-time blogger. That is how crazy it's gotten. I would never have predicted this outcome in a million years, and writing it all down like this actually freaks me out a little bit.
I mention these things not because I'm a big fat showoff (or at least that's not the only reason), but because I achieved all this without being particularly talented. It was done one small post at a time, with no real planning or strategy whatsoever, beyond the simple incremental suck less every year kind. I am continually amazed and completely humbled by the success of this blog. All it took was a basic commitment to keep jabbing, keep shipping, keep firing.
If anything, what I've learned is this: if I can achieve this kind of success with my blog, so can you. So if you're wondering why the first thing I ask you when I meet you is "do you have a blog?" or "why don't you post to your blog more regularly?", or "could you turn that into a blog post?", now you know why. It's not just because I'm that annoying blog guy; it's because I'd like to wish the kind of amazing success I've had on everyone I meet.
I'm just trying to share my easy one step plan to achieve Ultimate Blog Success: find a posting schedule you can live with, and stick to it for a year. Probably several years. Okay, so maybe that one step is really not quite so easy as I made it out to be. But everyone has to start somewhere, and the sooner the better.
So when was the last time you wrote a blog post?
October 25, 2007
I'd Consider That Harmful, Too
One of the seminal papers in computer science is Edsger Dijkstra's 1968 paper GOTO Considered Harmful.
For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce. More recently I discovered why the use of the go to statement has such disastrous effects, and I became convinced that the go to statement should be abolished from all "higher level" programming languages (i.e. everything except, perhaps, plain machine code).
The abuse of GOTO is, thankfully, a long forgotten memory in today's modern programming languages. Of course, it's only a minor hazard compared to the COMEFROM statement, but I'm glad to have both of those largely behind us.
GOTO isn't all bad, though. It still has some relevance to today's code. Along with many other programmers, I always recommend using guard clauses to avoid arrow code, and I also recommend exiting early from a loop as soon as you find the value you're looking for. What is an early Return, or an early Exit For other than a tightly scoped GOTO?
foreach my $try (@options) {
next unless exists $hash{$try};
do_something($try);
goto SUCCESS;
}
log_failure();
SUCCESS: ...
The publication of such an influential paper in this particular format led to an almost immediate snowclone effect, as documented on Wikipedia:
Frank Rubin published a criticism of Dijkstra's letter in the March 1987 CACM where it appeared as 'GOTO Considered Harmful' Considered Harmful. The May 1987 CACM printed further replies, both for and against, as '"GOTO Considered Harmful" Considered Harmful' Considered Harmful?. Dijkstra's own response to this controversy was titled "On a somewhat disappointing correspondence".
That's easily one of the funniest things I've ever read in Wikipedia. Who says computer scientists don't have a sense of humor? But I digress. Most software developers are probably familiar, at least in passing, with Dijkstra's GOTO Considered Harmful. But here's what they might not know about it:
- The paper was originally titled "A Case Against the Goto Statement"; the editor of the CACM at the time, Niklaus Wirth, changed the title to the more inflammatory version we know today.
- In order to speed up its publication, the paper was converted into a "Letter to the Editor".
In other words, Wirth poked and prodded the content until it became incendiary, to maximize its impact. The phrase "considered harmful" was used quite intentionally, as documented on the always excellent Language Log:
However, "X considered harmful" was already a well-established journalistic cliche in 1968 -- which is why Wirth chose it. The illlustration below shows the headline of a letter to the New York Times published August 12, 1949: "Rent Control Controversy / Enacting Now of Hasty Legislation Considered Harmful".
I'm sure it's not the earliest example of this phrase used in a headline or title, either -- I chose it only as a convenient illustration of susage a couple of decades before the date of Dijkstra's paper.
Note that this example is also in the title of a slightly cranky letter to the editor - it's probably not an accident that the first example that came to hand of "considered harmful" in a pre-Dijkstra title was of this type.
So when you emulate the "considered harmful" style predicated on the work of these famous computer scientists in 1968, keep that history in mind. You're emulating a slightly cranky letter to the editor. It's frighteningly common-- there are now 28,800 web pages with the exact phrase "considered harmful" in the title.
This leads, perhaps inevitably, to Eric Meyer's "Considered Harmful" Essays Considered Harmful. He points out that choosing this style of dialogue is ultimately counterproductive:
There are three primary ways in which "Considered Harmful" essays cause harm.
- The writing of a "considered harmful" essay often serves to inflame whatever debate is in progress, and thus makes it that much harder for a solution to be found through any means. Those who support the view that the essay attacks are more likely to dig in and defend their views by any means necessary, and are less receptive to reasoned debate. By pushing the opposing views further apart, it becomes more likely that the essay will cause a permanent break between opposing views rather than contribute to a resolution of the debate.
- "Considered harmful" essays are most harmful to their own causes. The publication of a "considered harmful" essay has a strong tendency to alienate neutral parties, thus weakening support for the point of view the essay puts forth. A sufficiently dogmatic "considered harmful" essay can end a debate in favor of the viewpoint the essay considers harmful.
- They've become boring cliches. Nobody really wants to read "considered harmful" essays any more, because we've seen them a thousand times before and didn't really learn anything from them, since we were too busy being annoyed to really listen to the arguments presented.
If you have a point to make, by all means, write a great persuasive essay. If you want to maximize the effectiveness of your criticisms, however, you'll leave "considered harmful" out of your writing. The "considered harmful" technique may have worked for Wirth and Dijkstra, but unless you're planning to become a world famous computer scientist like those guys, I'd suggest leaving it back in 1968 where it belongs.
October 24, 2007
Hardware Assisted Brute Force Attacks: Still For Dummies
Evidently hardware assisted brute force password cracking has arrived:
A technique for cracking computer passwords using inexpensive off-the-shelf computer graphics hardware is causing a stir in the computer security community.Elcomsoft, a software company based in Moscow, Russia, has filed a US patent for the technique. It takes advantage of the "massively parallel processing" capabilities of a graphics processing unit (GPU) - the processor normally used to produce realistic graphics for video games.
Using an $800 graphics card from nVidia called the GeForce 8800 Ultra, Elcomsoft increased the speed of its password cracking by a factor of 25, according to the company's CEO, Vladimir Katalov. The toughest passwords, including those used to log in to a Windows Vista computer, would normally take months of continuous computer processing time to crack using a computer's central processing unit (CPU). By harnessing a $150 GPU - less powerful than the nVidia 8800 card - Elcomsoft says they can be cracked in just three to five days. Less complex passwords can be retrieved in minutes, rather than hours or days.
GPUs, with their massive built-in paralellism, were built to do things like this. I'm encouraged that we're finally able to harness all that video silicon to do useful things beyond rendering Doom at 60 frames per second with anti-aliasing and anisotropic filtering.
There's a bit more detail on the elecom approach in their one-page PDF. They provide actual numbers there.
Using the "brute force" technique of recovering passwords, it was possible, though time-consuming, to recover passwords from popular applications. For example, the logon password for Windows Vista might be an eight-character string composed of uppercase and lowercase alphabetic characters. There would about 55 trillion (52 to the eighth power) possible passwords. Windows Vista uses NTLM hashing by default, so using a modern dual-core PC you could test up to 10,000,000 passwords per second, and perform a complete analysis in about two months. With ElcomSoft's new technology, the process would take only three to five days, depending upon the CPU and GPU.Preliminary tests using Elcomsoft Distributed Password Recovery show that the [brute force password cracking] speed has increased by a factor of twenty, simply by hooking up with a $150 video card's onboard GPU. ElcomSoft expects to find similar results as this new technology is incorporated into their password recovery products for Microsoft Office, PGP, and dozens of other popular applications.
It's fun, and it makes for a shocking "Password Cracking Supercomputers On Every Desktop Make Passwords Irrelevant" headline, but password cracking supercomputers on every desktop doesn't mean the end of password-protected civilization as we know it. Let's do the math.
How many passwords can we attempt per second?
| Dual Core CPU | 10,000,000 |
| GPU | 200,000,000 |
How many password combinations do we have to try?
528 = 53,459,728,531,456
That's a lot of potential passwords. Let's stop playing Quake Wars for a few days and get cracking:
53,459,728,531,456 / 10,000,000 pps / 60 / 60 / 24 = 61.9 days 53,459,728,531,456 / 200,000,000 pps / 60 / 60 / 24 = 3.1 days
As promised by elecom, that works out to a little over three days at the GPU crack rate, and two months at the CPU crack rate. Oooh. Scary. Worried yet? If so, you shouldn't be. Watch what happens when I add four additional characters to the password:
5212 / 200,000,000 pps / 60 / 60 / 24 = 22,620,197 days
For those of you keeping score at home, with a 12 character password this hardware assisted brute-force attack would take 61,973 years. Even if we increased the brute force attack rate by a factor of a thousand, it would still take 62 years.
Elecom's idea of an 8 character password is awfully convenient, too. Only lowercase and uppercase letters, a total of 52 possible choices per character. Who has passwords without at least one number? Even MySpace users are smarter than that. If you include a number in your 8 character password, or a non-alphanumeric character like "%", attack times increase substantially. Not enough to mitigate the potential attack completely, mind you, but you'd definitely put a serious dent in any brute forcing effort by switching out a character or two.
628 / 200,000,000 pps / 60 / 60 / 24 = 13 days 728 / 200,000,000 pps / 60 / 60 / 24 = 42 days
Personally, I think it's easier to go with a pass phrase than a bunch of random, difficult to remember gibberish characters as a password. Even if your pass phrase is in all lower-case-- a mere 26 possible characters -- that exponent is incredibly potent.
2610 / 200,000,000 pps / 60 / 60 / 24 = 8 days 2612 / 200,000,000 pps / 60 / 60 / 24 = 15 years 2614 / 200,000,000 pps / 60 / 60 / 24 = 10,228 years
By the time you get to a mere 14 characters-- even if they're all lowercase letters-- you can pretty much forget about anyone brute forcing your password. Ever.
So what have we learned?
Brute force attacks, even fancy hardware-assisted brute force attacks, are still for dummies. If this is the best your attackers can do, they're too stupid to be dangerous. Brute forcing is almost always a waste of time, when vastly more effective social vectors and superior technical approaches are readily available.
Hardware-assisted brute force attacks will never be a credible threat. But short, simple passwords are still dangerous. If your password is only 8 alphabet characters, and if it's exposed in a way that allows brute force hardware assisted attack, you could be in trouble. All you need to do to sleep soundly at night (well, at least as far as brute force attacks are concerned) is choose a slightly longer password. It's much safer to think of your security in terms of passphrases instead of passwords. And unlike "secure" 8 character passwords, passphrases are easy to remember, too. Have you considered helping me evangelize passphrases?
October 23, 2007
Virtual Machine Server Hosting
My employer, Vertigo Software, graciously hosted this blog for the last year. But as blog traffic has grown, it has put a noticeable and increasing strain on our bandwidth. Even on an average day, blog traffic consumes a solid 30 percent of our internet connection-- and much more if something happens to be popular. And that's after factoring in all the bandwidth-reducing tricks I could think of.
While I greatly appreciate my employer's generosity, I don't like causing all my coworkers' internet connections to slow to a crawl. So when my friend and co-author Phil Haack mentioned that we could share a dedicated server through a contact of his, I jumped at the chance.
I'm a big believer in virtualization, so I wanted a beefy physical server that could handle running at least four virtual servers. And I wanted it to run a 64-bit host operating system, as 64-bit offers huge performance benefits for servers. Nobody in their right mind should build up a 32-bit server today.
The contact he was referring to works at CrystalTech. And boy, did CrystalTech ever hook us up:
- Windows Server 2003 R2 x64
- Quad-core Xeon X3210 @ 2.13 Ghz
- 4 GB RAM
- 300 GB RAID-5 array
Not too shabby. It is, of course, an obscene amount of power for our relatively modest needs. Have I mentioned how much I like my new friends at CrystalTech? Or what great deals they have on hosting?
But in all seriousness, it's effectively a new sponsor for this blog, so welcome aboard.
I was already hosting this server as a VM, so here's what I did to switch over to completely new hardware:
- shut down my VM
- compacted and compressed it
- transferred it to the new server
- booted it up again
All I had to do was change the IP address in the VM and I was up and running as if nothing had changed. That's the easiest server migration I've ever experienced, all thanks to virtualization.
Phil and I are both Windows ecosystem developers, so we went with what we knew. But virtualization provides total flexibility. I could spin up a new Linux server at a moment's notice if I decided to switch this blog over to the LAMP stack. Or I could play with the latest release candidate of Windows Server 2008. And they can all run in parallel, assuming we have enough memory. That's what I love most about virtualization-- the freedom.
Although Phil and I share admin access to the host machine, we have our own private playgrounds in our virtual servers. We're completely isolated from each other's peculiarities and weirdnesses: nothing we do (well, almost nothing) can affect the other person's virtual machine. Reboot? No problem. Install some stupid software I can't stand? Go for it. Format the drive and start over? Don't care. It's your machine. Do whatever.
The only downside to virtual machine server hosting is that it can be difficult to share IPs between virtual machines. CrystalTech has provided us with a block of 6 public IP addresses, so fortunately we don't have to worry about this. One IP is occupied by the host, but that still leaves five IPs for virtual machines of our creation. That's plenty.
But let's say we only had two public IP addresses-- or we wanted to run lots and lots of virtual machines with a small pool of public IP addresses. What then? How could codinghorror.com and haacked.com share the same IP address (and port 80), when they're on two different virtual machines? They clearly can't occupy the same IP.
codinghorror.com 10.0.0.1:80 haacked.com 10.0.0.1:80
On a single physical server, the answer is easy-- virtual hosting, or host header routing. But that requires our websites to live side by side on the same server. Phil and I don't share our wives, so why would we share a server? No offense intended to either of our wives-- or our respective servers-- but sharing is an unacceptable solution. I like you, Phil... but not that much.
If you want two different machines (physical or virtual) to share an IP, it takes some clever trickery. In the Windows ecosystem, that clever trickery often comes in the form of Microsoft's ISA Server. (I'm not sure what the open source equivalent is, but I'm confident it's out there.)
ISA Server acts as our public interface to the world, talking through a public IP address. All DNS entries, and thus HTTP traffic, would be directed to that single public IP address. As our gatekeeper, ISA Server is in a unique position to do lots of cool stuff for us, like firewalling, caching, and so on. But we only care about one particular feature right now: the ability to share an IP address between multiple machines. This is known as a "web rule" in ISA parlance. With appropriate web rules in effect for both of our sites, ISA Server will shuttle the HTTP requests back and forth to the correct private IP addresses based on the host headers. It basically extends the host header routing concepts we saw in Apache and IIS outside the confines of a particular machine.
ISA Server 10.0.0.1:80 codinghorror.com 192.168.0.1:80 haacked.com 192.168.0.2:80
That's one way you can host fifty websites, all running on fifty different machines, with a single public IP address. It's a very clever trick indeed. Unfortunately, ISA Server isn't the simplest of products to configure and administer. I'm glad we have enough public IPs that we don't have to worry about sharing them between multiple machines. But it's definitely something you should be aware of, as virtual servers become increasingly commonplace.. and the pool of available IP addresses continues to dwindle.
October 21, 2007
Let's Play Planning Poker!
One of the most challenging aspects of any software project is estimation-- determining how long the work will take. It's so difficult, some call it a black art. That's why I highly recommend McConnell's book, Software Estimation: Demystifying the Black Art; it's the definitive work on the topic. Anyone running a software project should own a copy. If you think you don't need this book, take the estimation challenge: how good an estimator are you?
How'd you do? If you're like the rest of us, you suck. At estimating, I mean.
Given the uncertainty and variability around planning, it's completely appropriate that there's a game making the rounds in agile development circles called Planning Poker.
There are even cards for it, which makes it feel a lot more poker-ish in practice. And like poker, the stakes in software development are real money-- although we're usually playing with someone else's money. If you have a distributed team, card games may seem like a cruel joke. But there's a nifty web-based implementation of Planning Poker, too.
Planning Poker is a form of the estimation technique known as Wideband Delphi. Wideband Delphi was created by the RAND corporation in 1968. I assume by Delphi they're referring to the oracle at Delphi. If anything says "we have no clue how long this will take", it's naming your estimation process after ancient, gas-huffing priestesses who offered advice in the form of cryptic riddles. It doesn't exactly inspire confidence, but that's probably a good expectation to set, given the risks of estimation.
Planning Poker isn't quite as high concept as Wideband Delphi, but the process is functionally identical:
- Form a group of no more than 10 estimators and a moderator. The product owner can participate, but cannot be an estimator.
- Each estimator gets a deck of cards: 0, 1, 2, 3, 5, 8, 13, 20, 40, and 100.
- The moderator reads the description of the user story or theme. The product owner answers brief questions from the estimators.
- Every estimator selects an estimate card and places it face down on the table. After all estimates are in, the cards are flipped over.
- If the estimates vary widely, the owners of the high and low estimates discuss the reasons why their estimates are so different. All estimators should participate in the discussion.
- Repeat from step 4 until the estimates converge.
There's nothing magical here; it's the power of group dialog and multiple estimate averaging, delivered in an approachable, fun format.
Planning Poker is a good option, particularly if your current estimation process resembles throwing darts at a printout of a Microsoft Project Gantt chart. But the best estimates you can possibly produce are those based on historical data. Steve McConnell has a whole chapter on this, and here's his point:
If you haven't previously been exposed to the power of historical data, you can be excused for not currently having any data to use for your estimates. But now that you know how valuable historical data is, you don't have any excuse not to collect it. Be sure that when you reread this chapter next year, you're not still saying "I wish I had some historical data!"
In other words, if you don't have historical data to base your estimates on, begin collecting it as soon as possible. There are tools out there that can help you do this. Consider the latest version of Fogbugz; its marquee feature is evidence-based scheduling. Armed with the right historical evidence, you can..
Predict when your software will ship. Here you can see we have a 74% chance of shipping by December 17th.
Determine which developers are on the critical path. Some developers are better at estimating than others; you can shift critical tasks to developers with a proven track record of meeting their estimates.
See how accurate an estimator you really are. How close are your estimates landing to the actual time the task took?
See your predicted ship dates change over time. We're seeing the 5%, 50%, and 95% estimates on the same graph here. Notice how they converge as development gets further along; this is evidence that the project will eventually complete, and you won't be stuck in some kind of Duke Nukem Forever limbo.
Witness, my friends, the power of historical data on a software project.
The dirty little secret of evidence based scheduling is that collecting this kind of historical data isn't trivial. Garbage in, garbage out. It takes discipline and concerted effort to enter the effort times-- even greatly simplified versions-- and to keep them up to date as you're working on tasks. Fogbugz does its darndest to make this simple, but your team has to buy into the time tracking philosophy for it to work.
You don't have to use Fogbugz. But however you do it, I urge you to begin capturing historical estimation data, if you're not already. It's a tremendous credit to Joel Spolsky that he made this crucial feature the centerpiece of the new Fogbugz. I'm not aware of any other software lifecycle tools that go to such great lengths to help you produce good estimates.
Planning Poker is a reasonable starting point. But the fact that two industry icons, Joel Spolsky and Steve McConnell, are both hammering home the same point isn't a coincidence. Historical estimate data is fundamental to the science of software engineering. Over time, try to reduce your reliance on outright gambling, and begin basing your estimates on real data. Without some kind of institutional estimation memory-- without appreciating the power of historical data-- you're likely to keep repeating the same estimation errors over and over.
October 18, 2007
Are Features The Enemy?
Mark Minasi is mad as hell, and he's not going to take it any more. In his online book The Software Conspiracy, he examines in great detail the paradox I struggled with yesterday-- new features are used to sell software, but they're also the primary reason that software spoils over time.
If a computer magazine publishes a roundup of word processors, the central piece of that article will be the "feature matrix," a table showing what word processing programs have which features. With just a glance, the reader can quickly see which word processors have the richest sets of features, and which have the least features. You can see an imaginary example in the following table:
MyWord 2.1 BugWord 2.0 SmartWords 3.0 Can boldface text X X Runs on the Atari 520 X Automatically indents first line of a paragraph X Includes game for practicing touch typing X X Lets you design your own characters X X Generates document tables of contents X Can do rotating 3D bullet points in color X X Can do bulleted lists X Supports Cyrillic symbol set X Includes Malaysian translater X X It looks like BugWord 2.0 is the clear value -- there are lots more check boxes in its column. However, a closer look reveals that it lacks some very basic and useful word processing features, which MyWord 2.1 has. But the easy-to-interpret visual nature of a feature matrix seems to mean that the magazine's message is: Features are good, and the more the better. As Internet Week senior executive editor Wayne Rash, a veteran of the computer press, says, "Look at something like PC Magazine, you'll see this huge comparison chart. Every conceivable feature any product could ever do shows up, and if a package has that particular feature, then there's a little black dot next to that product. What companies want is to have all the little black dots filled in because it makes their software look better."
Mark maintains that software companies give bugs in their existing software a low priority, while developing new features for the next version is considered critically important. As a result, quality suffers. He trots out this Bill Gates quote as a prime example:
There are no significant bugs in our released software that any significant number of users want fixed... The reason we come up with new versions is not to fix bugs. It's absolutely not. It's the stupidest reason to buy a new version I ever heard... And so, in no sense, is stability a reason to move to a new version. It's never a reason.
It's hard to argue with the logic. Customers will pay for new features. But customers will never pay companies to fix bugs in their software. Unscrupulous software companies can exploit this by fixing bugs in the next version, which just so happens to be jam packed full of exciting new features that will induce customers to upgrade.
Unlike Mark, I'm not so worried about bugs. All software has bugs, and if you accrue enough of them, your users will eventually revolt. Yes, the financial incentives for fixing bugs are weak, but the market seems to work properly when faced with buggy software.
A much deeper concern, for me, is the subtle, creeping feature-itis that destroys my favorite software. It's the worst kind of affliction-- a degenerative disease that sets in over time. As I've regrettably discovered in many, many years of using software, adding more features rarely results in better software. The commercial software market, insofar as it forces vendors to engage in bullet point product feature one-upsmanship, could be actively harming the very users it is trying to satisfy.
And the worst part, the absolute worst part, is that customers are complicit in the disease, too. Customers ask for those new features. And customers will use the dreaded "feature matrix" as a basis for comparing what applications they'll buy. Little do they know that they're slowly killing the very software that they love.
Today, as I was starting up WinAmp, I was blasted by this upgrade dialog.
Do I care about any of these new features? No, not really. Album art sounds interesting, but the rest are completely useless to me. I don't have to upgrade, of course, and there's nothing forcing me to upgrade. Yet. My concern here isn't for myself, however. It's for WinAmp. For every new all-singing, all-dancing feature, WinAmp becomes progressively slower, even larger, and more complicated. Add enough well-intentioned "features", and eventually WinAmp will destroy itself.
Sometimes, I wonder if the current commercial software model is doomed. The neverending feature treadmill it puts us on almost always results in extinction. Either the application eventually becomes so bloated and ineffective that smaller, nimbler competitors replace it, or the application slowly implodes under its own weight. In either case, nothing is truly fixed; the cycle starts anew. Something always has to give in the current model. Precious few commercial software packages are still around after 10 years, and most of the ones that are feel like dinosaurs.
Perhaps we should stop blindly measuring software as a bundle of features, as some kind of endless, digital all-you-can eat buffet. Instead, we could measure software by results-- how productive or effective it makes us at whatever task we're doing. Of course, measuring productivity and results is hard, whereas counting bullets on a giant feature matrix is brainlessly easy. Maybe that's exactly the kind of cop-out that got us where we are today.
October 17, 2007
Why Does Software Spoil?
In the software industry, the release of newer, better versions is part of the natural order. It's a relentless march towards perfection that started with the first personal computers, and continues today. We expect software to get larger and more sophisticated over time, to track with the hardware improvements that Moore's law has provided us for so many years. Rapid evolution is a good thing, and it's one reason the computer industry is so exciting to work in. If you don't like the way things are today, just wait five years; everything will be different.
Letts' Law: All programs evolve until they can send email.Zawinski's Law: Every program attempts to expand until it can read mail.
Furrygoat's Law: Every program attempts to expand until it can read RSS feeds.
I love the prospect of upgrading my favorite software. Done right, it's like watching a caterpillar shed its skin and become a beautiful butterfly. Or at least a decent-looking moth.
But for some software packages, something goes terribly, horribly wrong during the process of natural upgrade evolution. Instead of becoming better applications over time, they become worse. They end up more bloated, more slow, more complex, more painful to use.
They spoil.
I know this first hand because I'm a long-time Paint Shop Pro user. As a programmer who doesn't need the kitchen sink of graphics editor features, I found it an ideal match for my modest programmer needs. I didn't upgrade to every new version, but when I did, for every new feature I could actually use and benefit from, there were dozens of other features included that I didn't care about. These new features cluttered up the user interface and often interfered with what I wanted to do. My computers kept getting faster, and yet PSP kept taking longer and longer to start up with each new version.
| 2.0 | 1994? | 0.4 MB |
| 3.11 | 1995 | 1.8 MB |
| 4.12 | 1997 | 2.4 MB |
| 5.0 | 1998 | 6.7 MB |
| 6.0 | 1999 | ? |
| 7.0 | 2000 | 32 MB |
| 8.0 | 2003 | 54 MB |
| 9.0 | 2004 | 108 MB |
| 10.0 | 2005 | 104 MB |
| 11.00 | 2006 | 211 MB |
| 12.00 | 2007 | 326 MB |
If this spoilage goes on long enough, eventually you begin to loathe and fear the upgrade process. And that strikes me as profoundly sad, because it rips the heart out of the essential enjoyment of software engineering. We write software. If we inevitably end up making software worse, then why are we bothering? What are we doing wrong?
I'm not against progress by any means. But it sure seems to me that certain software packages have truly lost their way. In their never-ending quest to add feature bullets, they've somehow forgotten their users and their core values. In trying to be everything to everyone, they progressively destroy that tiny core of uniqueness that they started with. I'm singling out Paint Shop Pro here, but this same software spoilage principle applies to many other applications. PC World compiled an annotated list of 13 software applications they liked better before they were "improved":
- AIM
- ICQ
- Windows Live Messenger
- Windows Media Player
- iTunes
- QuickTime
- iMovie
- Paint Shop Pro
- ACDSee
- Adobe Acrobat Reader
- Eudora
They helpfully provide links to oldversion.com, oldapps.com, and old-versions.net, where you can go back in time and obtain those classic, unspoiled versions.
My favorite version is Winamp 2.95. That's before they started bulking up the client and adding completely unnecessary things. I just want something that plays my MP3s. I don't need it to burn CDs for me or download new music or cook my breakfast or massage my feet.There are also some emerging lightweight alternatives to choose from in each category. Instead of Adobe's 20 MB Acrobat Reader, you could opt for the 2 MB Foxit PDF Reader. Instead of suffering through another 300+ MB Paint Shop Pro upgrade, chock full of features I'll never use, I could opt for the open source Paint.NET.
It's depressing to me that there are very few apps I can stick with for more than five years before they become an untenable, unbearable mess. I can think of so many that I've liked and since discarded: Nero Burning ROM, WinAmp, ACDSee, Microsoft Money, WinZip, and many others.
I suppose features sell software. For many companies, putting users on the version upgrade treadmill is their business model; it's how they generate revenue. But if this fiscally rewarding feature creep goes on long enough, spoilage inevitably sets in. So I wonder: Is all software destined to spoil over time? Is it possible for software packages with long histories to avoid the trap of becoming bloated and irrelevant? What are your favorite bits of software that haven't spoiled over the years-- and what is their secret?
