November 30, 2006
The Project Postmortem
You may think you've completed a software project, but you aren't truly finished until you've conducted a project postmortem.
Mike Gunderloy calls the postmortem an essential tool for the savvy developer:
The difference between average programmers and excellent developers is not a matter of knowing the latest language or buzzword-laden technique. Rather, it can boil down to something as simple as not making the same mistakes over and over again. Fortunately, there's a powerful tool that any developer can use to help learn from the past: the project postmortem.
There's no shortage of checklists out there offering guidance on conducting your project postmortem. My advice is a bit more sanguine: I don't think it matters how you conduct the postmortem, as long as you do it. Most shops are far too busy rushing ahead to the next project to spend any time thinking about how they could improve and refine their software development process. And then they wonder why their new project suffers from all the same problems as their previous project.
Steve Pavlina offers a game developer's perspective on postmortems:
The goal of a postmortem is to draw meaningful conclusions to help you learn from your past successes and failures. Despite its grim-sounding name, a postmortem can be an extremely productive method of improving your development practices.
Game development is some of the most difficult software development on the planet. It's a veritable pressure cooker, which also makes it a gold mine of project postmortem knowledge. I've mentioned my fascination wth the Gamasutra postmortems before, but I didn't realize that all the Gamasutra postmortems had been consolidated into a book: Postmortems from Game Developer: Insights from the Developers of Unreal Tournament, Black and White, Age of Empires, and Other Top-Selling Games (Paperback) . Ordered. Also, if you're too lazy for all that pesky reading, Noel Llopis condensed all the commonalities from the Game Developer magazine postmortems.
Geoff Keighley's Behind the Games series, while not quite postmortems, are in the same vein. The early entries in the series are amazing pieces of investigative reporting on some of the most notorious software development projects in the game industry. Here are a few of my favorites:
- Haunted Glory: The Rise and Fall of Trilobyte
- The Final Hours of Half-Life
- Knee Deep in a Dream: The Story of Daikatana
- The Final Hours of Black & White
Most of the marquee games highlighted here suffered massive schedule slips and development delays. It's testament to the difficulty of writing A-list games. I can't wait to read The Final Hours of Duke Nukem Forever, which has been in development for almost ten years now. Its vaporware status is legendary-- here's a list of notable world events that have occurred since DNF began development. "When it's done", indeed.
Don't make the mistake of omitting the project postmortem from your project. If you don't conduct project postmortems, then how can you possibly know what you're doing right-- and more importantly, how to avoid making the same exact mistakes on your next project?
November 27, 2006
This Is What Happens When You Let Developers Create UI
Deep down inside every software developer, there's a budding graphic designer waiting to get out. And if you let that happen, you're in trouble. Or at least your users will be, anyway:
Joseph Cooney calls this The Dialog:
A developer needed a screen for something, one or two text boxes and not much more, so they created "the dialog", maybe just to "try something out" and always with the intention of removing it before the product ships. They discovered they needed a few more parameters, so a couple more controls were added in a fairly haphazard fassion. "The dialog" exposes "the feature", something cool or quite useful. Admittedly "the feature" is more tailored towards power users, but it's still pretty cool. The developer thinks of new parameters that would make "the feature" even more powerful and so adds them to the dialog. Maybe a few other developers or power users see "the dialog" and also like "the feature". But why doesn't it expose this parameter? New controls are added. Pretty soon the technical team are so used to seeing "the dialog" the way it is that they become blind to its strange appearance. Ship time approaches and the product goes through more thorough testing, and "the dialog" is discovered, but it is too late to be heavily re-worked. Instead it is given a cursory spruce-up.
If you let your developers create your UI, hilarity ensues, as in this classic OK/Cancel strip. But when The FileMatrix is unleashed upon unsuspecting users, it's more like a horror movie. I still get chills. And like a bad horror movie franchise, the FileMatrix is still alive and kicking, folks.
Friends don't let friends produce Developer UI.
Part of being a good software developer is knowing your limits. Either copy something that's already well designed, or have the good sense to stick to coding and leave the graphic design to the experts.
November 24, 2006
Discussions: Flat or Threaded?
Clay Shirky's classic articles on social software should be required reading for all software developers working on web applications. As near as I can tell, that's pretty much every developer these days.
But I somehow missed Joel Spolsky's related 2003 article on social software, Building Communities With Software.* It's an excellent, albeit somewhat long-winded, explanation of the way Joel runs his community forums. Although I recently accused Joel of jumping the shark, his scathing criticism of Usenet, Slashdot, and IRC is right on the money. All three are deeply flawed social software models, incapable of sustaining civilized discussion.
Joel advocates policies on his discussion boards that seem unworkable, even borderline anarchic:
- No registration
- No user moderation
- No email notifications for new posts
- No posted rules
- No support for quoting or reply shortcuts
- No unread post shortcuts
- Arbitrary deletion of off-topic posts
Reads like a recipe for disaster, doesn't it? But with one minor exception**, I'm in complete agreement. When it comes to writing social software, Joel's curmudgeonly advice may very well be the right approach. Read the rest of Joel's post to understand why.
In particular, I share Joel's intense dislike of threaded conversations:
Q. OK, but can't you at least have branching? If someone gets off on a tangent, that should be its own branch which you can follow or go back to the main branch.A. Branching is very logical to a programmer's mind but it doesn't correspond to the way conversations take place in the real world. Branched discussions are disjointed to follow and distracting. [..] Branching makes discussions get off track, and reading a thread that is branched is discombobulating and unnatural. Better to force people to start a new topic if they want to get off topic.
| Threaded | Flat | |
Two of the oldest and most popular discussion boards on the web, phpBB and vBulletin, avoid threaded views. The phpBB developers won't add threading. vBulletin offers threaded views, but they are off by default-- and often disabled completely by administrators.
Personally, I have yet find any threaded discussion format I like. Aside from the philosophical objections Joel raises, threaded discussions are painful to use. You're forced to click through to see the responses, and once you do, there's far too much pogo-ing up and down the hierarchy of the threaded discussions. It's all so.. unnecessary.
Flat discussion views have their limitations, too. But they're minor compared to the trainwreck that is threaded discussions. Until we can come up with a new discussion model that doesn't add a slew of new problems, let's take Joel's advice and stick with simple, flat discussion views.
* Thanks to Phil for pointing this article out to me.
** Quoted snippets are helpful if used in moderation. Unlike Joel, I don't have total recall of the last five posts I just read; judicious use of a few contextual quotes helps me keep the rest of the conversation in my brain.
November 23, 2006
CPU vs. GPU
Intel's latest quad-core CPU, the Core 2 Extreme QX6700, consists of 582 million transistors. That's a lot. But it pales in comparison to the 680 million transistors of nVidia's latest video card, the 8800 GTX. Here's a small chart of transistor counts for recent CPUs and GPUs:
| AMD Athlon 64 X2 | CPU | 154 m |
| Intel Core 2 Duo | CPU | 291 m |
| Intel Pentium D 900 | CPU | 376 m |
| ATI X1950 XTX | GPU | 384 m |
| Intel Core 2 Quad | CPU | 582 m |
| NVIDIA G8800 GTX | GPU | 680 m |
ATI won't release a new video card until next year. But their current X1950 XTX isn't exactly chopped liver: 384 million transistors is more than any current dual-core CPU.
Of course, comparing GPUs to CPUs isn't an apples-to-apples comparison. The clock rates are lower, the architectures are radically different, and the problems they're trying to solve are almost completely unrelated. But GPUs now exceed the complexity of modern CPUs in terms of absolute transistor count. And like CPUs, they're becoming programmable-- it's possible to harness all that graphics power to do something other than graphics.
There's a nice overview on AnandTech which provides some background on this architectural sea change in video cards:
So far, the only types of programs that have effectively tapped GPU power-- other than the obvious applications and games requiring 3D rendering-- have also been video related: video decoders, encoders, video effect processors, and so forth. But there are many non-video tasks that are floating-point intensive, and these programs have been unable to harness the power of the GPU.Meanwhile, the academic world has designed and utilized custom-built floating-point research hardware for years. These devices are known as stream processors. Stream processors are extremely powerful floating-point processors able to process whole blocks of data at once, whereas CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!, but these efforts pale in comparison to what custom hardware has been able to do.
3D rendering is also a streaming task. Modern GPUs have evolved into stream processors, sharing much in common with the customized hardware of researchers. GPU designers have cut corners where they don't need certain functionality for 3D rendering, but they have ultimately developed extremely fast and flexible stream processors. Modern GPUs are just as fast as custom hardware, but due to economies of scale are many, many times cheaper than custom hardware.
Dedicated, task-specific hardware is orders of magnitude faster than what you can achieve with a general purpose CPU. If you need proof of this, just look at the chess benchmarks. IBM's Deep Blue was capable of evaluating 200 million chess moves per second in 1997. Ten years later, the fastest quad-core desktop system can only evaluate 8 million chess moves per second. Ten year old custom hardware is still 25 times faster than the best general purpose CPUs. Amazing.
The most high profile application for all this GPU power at the moment is Stanford's Folding@Home. There's no shortage of exciting PR on this topic:
The processing power of just 5,000 ATI processors is also enough to rival that of the existing 200,000 computers currently involved in the Folding@home project; and it is estimated that if a mere 10,000 computers were to each use an ATI processor to conduct folding research, that the Folding@home program would effectively perform faster than the fastest supercomputer in existence today, surpassing the 1 petaFLOP level.
Stanford recently introduced a high performance folding client which runs on ATI's X1800 and X1900 series video cards. TechReport tested the new high performance folding client and came away a little disappointed:
Over five days, our Radeon X1900 XTX crunched eight work units for a total or 2,640 points. During the same period, our single Opteron 180 core chewed its way through six smaller work units for a score of 899 -- just about one third the point production of the Radeon. However, had we been running the CPU client on both of our system's cores, the point output should have been closer to 1800, putting the Radeon ahead by less than 50%.
The GPU may be doing 20 to 40 times more work, but the scores are calibrated to a baseline system, not the absolute amount of work that's done. It's a little anticlimactic.
Stanford's advanced folding client exploits the Brook Language, an extension to ANSI C that allows them to compile C-like code that runs on the GPU. It leverages ATI's Stream API to communicate with the GPU. NVIDIA offers something similar to Brook in their CUDA technology:
GPU computing with CUDA technology is an innovative combination of computing features in next generation NVIDIA GPUs that are accessed through a standard C language. Where previous generation GPUs were based on "streaming shader programs", CUDA programmers use C to create programs called threads that are similar to multi-threading programs on traditional CPUs. In contrast to multi-core CPUs, where only a few threads execute at the same time, NVIDIA GPUs featuring CUDA technology process thousands of threads simultaneously enabling a higher capacity of information flow.
Of course, CUDA only works on the latest G80 series of cards, just like the ATI's Stream technology is really only useful on their latest X1900 series. All this potential programmability is a very recent development.
I expect the relationship between CPU and GPU to largely be a symbiotic one: they're good at different things. But I also expect quite a few computing problems to make the jump from CPU to GPU in the next 5 years. The potential order-of-magnitude performance improvements are just too large to ignore.
November 22, 2006
Exploring Vista's Advanced Search
I used the file search function in Windows XP a lot, particularly to find groups of files. But the XP search syntax doesn't work in Vista. Vista uses the Windows Desktop Search query syntax. Which means
"*.vbproj;*.csproj"
becomes
"ext:(*.vbproj OR *.csproj)"
Note that the boolean operator must be in all-caps to work. That was painful to figure out.
I highly recommend reading through the Windows Desktop Search advanced query reference. First of all, it's completely different than searching in XP, so you'll need to retrain your brain. But it's also a far richer search paradigm than we ever had in XP. And you can use the same CTRL+E search keyboard shortcut that works in your browser to harness its power in Windows Explorer.
When you perform a search, note that the Search Tools menu is available; that's our main interface for all the new search options.
From here, you can bring up the Search Pane, which lets you filter your searches to particular file types, and includes an expandable Advanced Search pane.
As you fill in values in the Advanced Search pane and click Search, the equivalent query terms will be populated in the CTRL+E search box. It's a good way to learn basic search syntax. Once you've learned the new Vista search syntax, you won't need the Search Pane training wheels any more; you can press CTRL+E and type in what you want. It's Google-icious.
There's also an important distinction between indexed search locations and non-indexed search locations. To see the difference, choose "Search Options" from the Search Tools menu.
Most notably, your search terms will only extend to file contents in indexed locations. I'm also very glad to see search now ignores compressed files by default. This was a real pain in XP, which insisted on digging through 600 megabyte ZIP files as a part of any search.
To view indexed locations, or add your own, select Modify Index Locations from the Search Tools menu. On a default Vista install, there are only three indexed locations:
- Offline Files
- c:\Program Data\Microsoft\Windows\Start Menu
- c:\Users
There is one big caveat here: the full-text indexer only indexes file extensions that it understands. To view or modify the list of file extensions the indexer understands, click the Advanced Options button on the Modify Index Locations dialog, then select the File Types tab.
Perhaps the coolest new search feature is that you can enter searches directly from the Windows start menu. Try it. Hit the Windows key and just start typing search queries. There's nothing to install, nothing to configure, searching just works in Vista. It's about time.
November 21, 2006
iPod Alternatives
I have a great deal of respect for Apple's iPod juggernaut. They've almost single-handedly legitimized the market for downloadable music. The kind you pay for. The kind that, at least in theory, supports the artists who produce the music instead of ripping them off.
That said, I have some problems with the iPod.
- The iPod is boring. How can I properly rage against the machine with the same standard, factory issue music players that everyone else has? I don't want this to devolve into a knee-jerk rejection of all iThings, but let's be honest here: when every soccer Mom carries an iPod, it's no longer a cool technical accessory. It's completely mainstream. I'd be lying if I said this didn't matter to me.
- The iPod has no support for subscription services. I'm a member of Yahoo Music Unlimited, which gives me unlimited access to a massive library of music for 6 bucks a month. I can stream any of this music to multiple PCs, or I can download it to my hard drive or mobile audio players. And it's in a very respectable 192kbps 2-pass CBR format, too. For that same six bucks a month, I could buy a whopping six tracks from the iTunes store. While I can certainly understand the desire to own music, why not give us a choice? Apple's insistence on purchase-only models is a huge mistake.
- The iPod does not support WMA. Although Jobs grudgingly made the iPod Windows compatible two years after its introduction, he still gets his jabs in. The conspicuous lack of WMA support is a not-so-subtle f*ck you to the Windows community. And what of OGG? Or FLAC? Clearly, the hardware is capable, but the political forces inside Apple won't allow it. You'd figure a company that had the guts to make a stunning, wholesale switch to x86 processors could deign to support a few alternative audio formats on their music players. But no.
- The iPod lacks features. I'll never understand why the iPod chooses to deliberately ignore FM radio and its rich history in the music industry. Heck, you might even want to record FM radio. That's just crazy talk! And the list goes on: there's no voice recording, no EQ settings, no gapless playback, etcetera.
- The iPod requires custom software to work. Every music player on the market should have this down to a science by now:
- plug in the USB cable
- drag and drop your music on the device
- disconnect the cable and ROCK
The iPod fails miserably on this count: it requires iTunes installed (or another custom application) to transfer any music to the device. You can't even use it as an external hard drive without setting up a separate, special partition on the device first. Of course, use iTunes if you want, but you shouldn't be forced to use iTunes because the hardware is a brick if you don't. How did Apple get this so very, very wrong?
Now, your goals may not be my goals. But when my wife wanted a new music player to replace her aging Rio Carbon (RIP-- a great little player for its time), these are the criteria I used to evaluate them.
Unfortunately, music devices that can be used seamlessly and interchangeably as a generic external USB hard drive and digital music player are quite rare. The sole exception, at least for hard-disk devices, is the Cowon X5L. The Cowon is a decent player, but it suffers from Soviet Russia-era design aesthetics. Due to lack of choices, I was forced to compromise on devices that support Microsoft's Media Transfer Protocol. When connected to a Windows XP or Windows Vista machine, MTP support allows you to drag and drop music directly on to the device-- without installing any software. It's not ideal, since it's tied to Microsoft, but it's the best I can do.
The Digital Audio Players Review website had the most helpful advice. Their top pick was the Creative Zen Vision:M. I agreed, so I went with the pink one. You know, for the ladies.
It's a great little device, and as promised, we just dragged and dropped our music on it-- which happens to be a mix of MP3 and WMA files. And it worked with our Yahoo Music Unlimited subscription as well.
To complement the 30gb hard drive player, I also picked up a flash device-- the new, larger 4gb iRiver Clix.
I've owned a few iRiver products in the past and they've always been excellent. dapreview gave the Clix high marks, and so has everyone else who has reviewed it. The feature set is great. It meets every one of my criteria, throws in video support, and even goes a little beyond with support for Flash Lite games.
I respect the way the pioneering iPod has collectively led the industry out of the dark Napster ages. And I like the iPod design. But until Apple at least supports subscription services and the WMA/FLAC/OGG file formats, I can't justify purchasing any iPod hardware.
November 20, 2006
Filesystem Paths: How Long is Too Long?
I recently imported some source code for a customer that exceeded the maximum path limit of 256 characters. The paths in question weren't particularly meaningful, just pathologically* long, with redundant subfolders. To complete the migration, I renamed some of the parent folders to single character values.
This made me wonder: is 256 characters a reasonable limit for a path? And what's the longest path in my filesystem, anyway? I whipped up this little C# console app to loop through all the paths on my drive and report the longest one.
static string _MaxPath = "";
static void Main(string[] args)
{
RecursePath(@"c:\");
Console.WriteLine("Maximum path length is " + _MaxPath.Length);
Console.WriteLine(_MaxPath);
Console.ReadLine();
}
static void RecursePath(string p)
{
foreach (string d in Directory.GetDirectories(p))
{
if (IsValidPath(d))
{
foreach (string f in Directory.GetFiles(d))
{
if (f.Length > _MaxPath.Length)
{
_MaxPath = f;
}
}
RecursePath(d);
}
}
}
static bool IsValidPath(string p)
{
if ((File.GetAttributes(p) & FileAttributes.ReparsePoint) == FileAttributes.ReparsePoint)
{
Console.WriteLine("'" + p + "' is a reparse point. Skipped");
return false;
}
if (!IsReadable(p))
{
Console.WriteLine("'" + p + "' *ACCESS DENIED*. Skipped");
return false;
}
return true;
}
static bool IsReadable(string p)
{
try
{
string[] s = Directory.GetDirectories(p);
}
catch (UnauthorizedAccessException ex)
{
return false;
}
return true;
}
It works, but it's a bit more complicated than I wanted it to be, because
- There are a few folders we don't have permission to access.
- Vista makes heavy use of reparse points to remap old XP folder locations as symbolic links.
The longest path on a clean install of Windows XP is 152 characters.
c:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\S-1-5-18\ d42cc0c3858a58db2db37658219e6400_89e7e133-abee-4041-a1a7-406d7effde91
This is followed closely by a bunch of stuff in c:\WINDOWS\assembly\GAC_MSIL, which is a side-effect of .NET 2.0 being installed.
The longest path on a semi-clean install of Windows Vista is 195 characters:
c:\Windows\winsxs\ x86_microsoft-windows-m..-downlevelmanifests_31bf3856ad364e35_6.0.6000.16386_none_0041f38286aeaf07\ Microsoft-Windows-IIS-ClientCertificateMappingAuthentication-Deployment-DL.man
The longest path Microsoft created in Vista is 195 characters. But what's the longest path I can create?
The best I could do is 239 characters for folders, and 11 characters for the filename. Add in 3 characters for the inevitable "c:\", plus 6 slashes. That's a grand total of 259 characters. Anything longer and I got a "destination path too long" error.
The 259 character path limit I ran into jibes with the documented MAX_PATH limitation of the Windows shell:
The maximum length path (in characters) that can be used by the [Windows] shell is MAX_PATH (defined as 260). Therefore, you should create buffers that you will pass to SHFILEOPSTRUCT to be of length MAX_PATH + 1 to account for these NULLs.
If 259 characters plus a null seems like an unusually restrictive path limit for a modern filesystem like NTFS, you're right. The NTFS filesystem supports paths of 32,000 characters, but it's largely irrelevant because the majority of Windows APIs you'd use to get to those paths only accept paths of MAX_PATH or smaller. There is a wonky Unicode workaround to the MAX_PATH limitation, according to MSDN:
In the Windows API, the maximum length for a path is MAX_PATH, which is defined as 260 characters. A path is structured in the following order: drive letter, colon, backslash, components separated by backslashes, and a null-terminating character, for example, the maximum path on the D drive is D:\<256 chars>NUL.The Unicode versions of several functions permit a maximum path length of approximately 32,000 characters composed of components up to 255 characters in length. To specify that kind of path, use the "\\?\" prefix. The maximum path of 32,000 characters is approximate, because the "\\?\" prefix can be expanded to a longer string, and the expansion applies to the total length.
For example, "\\?\D:\<path>". To specify such a UNC path, use the "\\?\UNC\" prefix. For example, "\\?\UNC\<server>\<share>". These prefixes are not used as part of the path itself. They indicate that the path should be passed to the system with minimal modification, which means that you cannot use forward slashes to represent path separators, or a period to represent the current directory. Also, you cannot use the "\\?\" prefix with a relative path. Relative paths are limited to MAX_PATH characters.
The shell and the file system may have different requirements. It is possible to create a path with the API that the shell UI cannot handle.
Still, I wonder if the world really needs 32,000 character paths. Is a 260 character path really that much of a limitation? Do we need hierarchies that deep? Martin Hardee has an amusing anecdote on this topic:
We were very proud of our user interface and the fact that we had a way to browse 16,000 (!!) pages of documentation on a CD-ROM. But browsing the hierarchy felt a little complicated to us. So we asked Tufte to come in and have a look, and were hoping perhaps for a pat on the head or some free advice.He played with our AnswerBook for about 90 seconds, turned around, and pronounced his review:
"Dr Spock's Baby Care is a best-selling owner's manual for the most complicated 'product' imaginable -- and it only has two levels of headings. You people have 8 levels of hierarchy and I haven't even stopped counting yet. No wonder you think it's complicated."
I think 260 characters of path is more than enough rope to hang ourselves with. If you're running into path length limitations, the real problem isn't the operating system, or even the computers. The problem is the deep, dark pit of hierarchies the human beings have dug themselves into.
* ouch
November 17, 2006
Computers are Lousy Random Number Generators
The .NET framework provides two random number generators. The first is System.Random. But is it really random?
Pseudo-random numbers are chosen with equal probability from a finite set of numbers. The chosen numbers are not completely random because a definite mathematical algorithm is used to select them, but they are sufficiently random for practical purposes. The current implementation of the Random class is based on Donald E. Knuth's subtractive random number generator algorithm, from The Art of Computer Programming, volume 2: Seminumerical Algorithms.
These cannot be random numbers because they're produced by a computer algorithm; computers are physically incapable of randomness. But perhaps sufficiently random for practical purposes is enough.
The second method is System.Security.Cryptography.RandomNumberGenerator. It's more than an algorithm. It also incorporates the following environmental factors in its calculations:
- The current process ID
- The current thread ID
- The tick count since boot time
- The current time
- Various high precision CPU performance counters
- An MD4 hash of the user's environment (username, computer name, search path, etc)
Good cryptography requires high quality random data. In fact, a perfect set of encrypted data is indistinguishable from random data.
I wondered what randomness looks like. So I wrote the following program, which compares the two random number methods available in the .NET framework. In blue, System.Random, and in red, System.Cryptography.RandomNumberGenerator.
const int maxlen = 3000;
Random r = new Random();
RandomNumberGenerator rng = RandomNumberGenerator.Create();
Byte[] b = new Byte[4];
using (StreamWriter sw = new StreamWriter("random.csv"))
{
for (int i = 0; i < maxlen; i++)
{
sw.Write(r.Next());
sw.Write(",");
rng.GetBytes(b);
sw.WriteLine(Math.Abs(BitConverter.ToInt32(b, 0)));
}
}
I have no idea how to test for true randomness. The math is far beyond me. But I don't see any obvious patterns in the resulting data. It's utterly random noise to my eye. Although both of these methods produce reasonable randomness, they're ultimately still pseudo-random number generators. Computers are great number crunchers, but they're lousy random number generators.
To have any hope of producing truly random data, you must reach outside the computer and sample the analog world. For example, WASTE samples user mouse movements to generate randomness:
But even something as seemingly random as user input can be predictable; not all environmental sources are suitably random:
True random numbers are typically generated by sampling and processing a source of entropy outside the computer. A source of entropy can be very simple, like the little variations in somebody's mouse movements or in the amount of time between keystrokes. In practice, however, it can be tricky to use user input as a source of entropy. Keystrokes, for example, are often buffered by the computer's operating system, meaning that several keystrokes are collected before they are sent to the program waiting for them. To the program, it will seem as though the keys were pressed almost simultaneously.A better source of entropy is a radioactive source. The points in time at which a radioactive source decays are completely unpredictable, and can be sampled and fed into a computer, avoiding any buffering mechanisms in the operating system. In fact, this is what the HotBits people at Fourmilab in Switzerland are doing. Another source of entropy could be atmospheric noise from a radio, like that used here at random.org, or even just background noise from an office or laboratory. The lavarand people at Silicon Graphics have been clever enough to use lava lamps to generate random numbers, so their entropy source not only gives them entropy, it also looks good! The latest random number generator to come online is EntropyPool which gathers random bits from a variety of sources including HotBits and random.org, but also from web page hits received by the EntropyPool's web server.
Carl Ellision has an excellent summary of many popular environmental sources of randomness and their strengths and weaknesses. But environmental sources have their limits, too-- unlike pseudo-random algorithms, they have to be harvested over time. Not all environmental sources can provide enough random data for a server under heavy load, for example. And some encryption methods require more random data than others; one particularly secure algorithm requires one bit of random data for each bit of encrypted data.
Computers may be lousy random number generators, but we've still come a long way:
As recently as 100 years ago, people who needed random numbers for scientific work still tossed coins, rolled dice, dealt cards, picked numbers out of hats, or browsed census records for lists of digits. In 1927, statistician L.H.C. Tippett published a table of 41,600 random numbers obtained by taking the middle digits from area measurements of English churches. In 1955, the Rand Corporation published A Million Random Numbers With 100,000 Normal Deviates, a massive tome filled with tables of random numbers. To remove slight biases discovered in the course of testing, the million digits were further randomized by adding all pairs and retaining only the last digit. The Rand book became a standard reference, still used today in low-level applications such as picking precincts to poll.
The world is random. Computers aren't. Randomness is really, really hard for computers. It's important to understand the ramifications of this big divide between the analog and digital world, otherwise you're likely to make the same rookie mistakes Netscape did.
November 16, 2006
It's Never Been Built Before
In Microsoft Project and the Gantt Waterfall, many commenters wondered why software projects can't be treated like any other construction or engineering project:
I am not sure why it is so difficult to estimate software development? Is it a mystery, magic, is there a man behind the curtain that every project depends on?I mean it's simple, come on! It's like a designer and general contractor coming over and estimating how long it will take to remodel your kitchen.
But software projects truly aren't like other engineering projects. I don't say this out of a sense of entitlement, or out of some misguided attempt to obtain special treatment for software developers. I say it because the only kind of software we ever build is unproven, experimental software. Sam Guckenheimer explains:
To overcome the gap, you must recognize that software engineering is not like other engineering. When you build a bridge, road, or house, for example, you can safely study hundreds of very similar examples. Indeed, most of the time, economics dictate that you build the current one almost exactly like the last to take the risk out of the project.With software, if someone has built a system just like you need, or close to what you need, then chances are you can license it commercially (or even find it as freeware). No sane business is going to spend money on building software that it can buy more economically. With thousands of software products available for commercial license, it is almost always cheaper to buy. Because the decision to build software must be based on sound return on investment and risk analysis, the software projects that get built will almost invariably be those that are not available commercially.
This business context has a profound effect on the nature of software projects. It means that software projects that are easy and low risk, because they've been done before, don't get funded. The only new software development projects undertaken are those that haven't been done before or those whose predecessors are not publicly available. This business reality, more than any other factor, is what makes software development so hard and risky, which makes attention to process so important.
One kitchen remodelling project is much like another, and the next airplane you build will be nearly identical to the last five airplanes you built. Sure, there are some variables, some tweaks to the process over time, but it's a glorified factory assembly line. In software development, if you're repeating the same project over and over, you won't have a job for very long. At least not on this continent. If all you need is a stock airplane, you buy one. We're paid to build high-risk, experimental airplanes.
The appropriately named happy-go-lucky left this comment on the same post which explains the distinction quite succinctly:
The problem originates in the fact that "Software Project" shares the word "project" with "engineering project" or "construction project.""Ah-ha!" our hapless software managers cry, as they roll-out the money for a copy of MS Project. And the software developers cry all the way 'til the end of the "project."
Software invention is just that. It has more in common with Mr. Edison's Menlo Park laboratory than the local hi-rise construction site.
Of course, there are far fewer variables on a work site than in software invention. Gravity remains the same, elements don't change and concrete has known properties. Things aren't that stable when operating systems, hardware, software and business processes come together in "the perfect speculation."
In software engineering projects, you aren't subject to God's rules. You get to play God: define, build, and control an entire universe in a box. It's a process of invention and evolution, not an assembly line. Can software development be managed? Sure. But it's a mistake to try managing it the same way you'd manage a kitchen remodel.
November 15, 2006
Simplicity as a Force
Simplicity isn't easy to achieve, and John Maeda's short book, The Laws of Simplicity, provokes a lot of thought on the topic.
Programmers swim in a sea of unending complexity. We get so used to complexity as an ambient norm that we begin, consciously or unconsciously, projecting it into our work. Simplicity is tough in any field, but in ours, we exacerbate the problem. Impossibly complex applications are the default deliverable for new programmers. Only seasoned software development veterans are capable of producing applications that are easy to understand and troubleshoot. Simplicity isn't achievable as a passive goal; it's a force that must be actively applied.
You can read most of the book online via John's excellent blog, including abbreviated versions of the ten laws of simplicity. One of my favorite sections is about the evolution of the iPod's controls.
Keep it Simple, Stupid. If only it was that easy. It feels more like back-breaking work to keep things from inevitably devolving into complexity.
