December 31, 2008
Finishing The Game
In yesterday's post, I asked this question:
Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?
Most people answer 50%.
Unfortunately, this isn't correct.
This problem, although seemingly simple, is hard to understand. For cognitive reasons that are not fully understood, while our intuitions regarding a priori possibilities are fairly good, we are easily misled when we try to use probability to quantify our knowledge. This is a fancypants way of saying there were almost a thousand comments on that post, with not a lot of agreement to be found.
The key thing to bear in mind here is that we have been given additional information. If we don't use that information, we arrive at 50% -- the odds of a girl or boy being born to any given pregnant woman. That's true insofar as it goes, but it's the answer to a different, much simpler question, and certainly not the answer to the question we asked.
Our question contains additional information:
- The person has two children.
- One of those children is a girl.
We can use that information to come up with a better, more correct answer. We know this person has two children. What are all possible combinations of two children?
BB, GB, BG, GG
We know that one of the children is a girl. This rules out one of those possible combinations of two children (BB), so we're left with:
GB, BG, GG
Of the remaining three possibilities, two include boys.
GB, BG
Thus, the odds of this person having a boy and a girl is 2/3 or 66%.
I noticed a few comments where people complained that the GB and BG possibilities are the same thing, and should have been reduced to
BG/GB, GG
Which equates to 1/2 or 50%.
If you made this mistake, you're in good company: so did Blaise Pascal, as the book The Unfinished Game: Pascal, Fermat, and the Seventeenth-Century Letter that Made the World Modern explains.
Here's how Keith Devlin describes the famous letter:
In 1654, the gambler Antoine Gombaud, whose noble title was the Chevalier de Mere, apporached his friend Pascal with some questions about games of chance, including the problem of the unfinished game. After some thought, Pascal found a possible solution but was not completely sure his reasoning was correct. Accordingly, he sent his ideas to Fermat to see if his countryman agreed with the argument. The brief exchange of letters that ensued -- and one letter in particular -- represented one of the most profound advancements in the history of mathematical thought.
I'll tell you one thing I learned from this book: It's amazing how many early advancements in math were based on gambling. I guess it's sort of the same historical relationship between video technology and pornography. Not that there's anything wrong with that. Anyway, the "unfinished game" I alluded to in my previous post title is the central topic of these letters between Blaise Pascal and Pierre Fermat. Here's a modernized, slightly simplified version of it:
Two players, Harry and Ted, place equal bets on who will win the best of 5 coin tosses. In each round, Harry always chooses heads (H), and Ted always chooses tails (T). Suppose they are forced to abandon the game after 3 coin tosses, with Harry ahead 2 to 1. What is the fairest way to divide the pot?
Let's enumerate all possible outcomes from the 2 remaining coin tosses.
HH HT TH TT
Only 1 of these 4 possibilities allows Ted to win. Thus, if the game has to be abandoned, the pot should be split 3/4 to Harry and 1/4 to Ted.
But, since Harry is already ahead 2 to 1, you might argue that it's nonsensical to consider all those "extra" possibilities; as soon as Harry gets that third head on a coin toss, the game is over. Thus, we only need to consider possibilities where the game would actually continue:
H TH TT
By this accounting, Harry would get 2/3 of the pot, and Ted 1/3. We know this is wrong. By leaving the game "unfinished" and not enumerating every possibility -- we've made a mistake. But how?
You don't need to be a mathematician to prove this. I'm just a crappy programmer, and even my crappy code can brute force the answer by simulating results from thousands of games.
var rand = new Random();
var results = new Dictionary<string, int>();
int tosses = 2;
for (int i = 0; i < 10000; i++)
{
string result = "HHT";
for (int toss = 0; toss < tosses; toss++)
{
result += (rand.Next(2) == 0) ? "H" : "T";
if (Regex.Matches(result, "H").Count == 3 || Regex.Matches(result, "T").Count == 3) break;
}
if (results.ContainsKey(result))
results[result]++;
else
results.Add(result, 1);
}
foreach (var item in results)
{
Console.WriteLine(item.Key + " : " + item.Value);
}
HHTTT | 2,438 |
HHTTH | 2,457 |
HHTH | 5,105 |
The unfinished games are not equally likely! But the results are definitely clear, and agree with what the equally likely finished games predicted: 75% for Harry, and 25% for Ted.
I've made awfully similar "unfinished game" mistakes before, in particular when writing a card shuffling algorithm. It was my hope in presenting this problem that you'll be able to recognize it too the next time you see it, even if the math behind it is not at all intuitive.
December 30, 2008
The Problem of the Unfinished Game
Today's post is a simple question.
Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?
Consider your answer carefully, without doing a web search, or reading the comments to this post. Don't cheat -- but be prepared to explain your reasoning, because the solution might surprise you.
It's almost like some kind of conspiracy or something.
December 29, 2008
Programming: Love It or Leave It
In a recent Joel on Software forum post Thinking of Leaving the Industry, one programmer wonders if software development is the right career choice in the face of broad economic uncertainty:
After reading the disgruntled posts here from long time programmers and hearing so much about ageism and outsourcing, I'm thinking of leaving the industry. What is a good industry to get into where your programming skills would put you at an advantage?
Joel Spolsky responded:
Although the tech industry is not immune, programming jobs are not really being impacted. Yes, there are fewer openings, but there are still openings (see my job board for evidence). I still haven't met a great programmer who doesn't have a job. I still can't fill all the openings at my company.Our pay is great. There's no other career except Wall Street that regularly pays kids $75,000 right out of school, and where so many people make six figures salaries for long careers with just a bachelors degree. There's no other career where you come to work every day and get to invent, design, and engineer the way the future will work.
Despite the occasional idiot bosses and workplaces that forbid you from putting up Dilbert cartoons on your cubicle walls, there's no other industry where workers are treated so well. Jesus you're spoiled, people. Do you know how many people in America go to jobs where you need permission to go to the bathroom?
Stop the whining, already. Programming is a fantastic career. Most programmers would love to do it even if they didn't get paid. How many people get to do what they love and get paid for it? 2%? 5%?
I tend to agree with Joel's brand of tough love. What he seems to be saying -- after taking my usual poetic license -- is this:
Programming: love it or leave it.
Unless you're fortunate enough to work for a top tier software development company, like Google, Microsoft, or Apple, you've probably experienced first hand the huge skill disparities in your fellow programmers. I'm betting you've also wondered more than once why some of your coworkers can't, well, program. Even if that's what their job description says.
Over the last twenty years, I've worked with far too many programmers who honestly had no business being paid to be a programmer. Now, I'm not talking about your average programmer here. We're all human, and we all make mistakes. I'm talking about the Daily WTF crew. People that actively give programming a bad name, and you, as their coworker, a constant headache.
Like Joel, I'm not ready to call the current conditions a new dot com bubble yet, because business is still quite good. But one of the (very) few bright spots of the previous bubble was that it weeded out all the people who didn't truly love software development. Once the incentive to become an overnight dot-com genius programmer millionaire was gone, computer science enrollment suddenly dropped precipitously at colleges across the country. The only people left applying for programming jobs were the true freaks and geeks who, y'know, loved this stuff. The kind of people I had originally enjoyed working with so much. At least until a bunch of careerist gold diggers suddenly showed up and started polluting our workplace.
As much as the dot com bubble sucked, I was intensely glad to see these people go. Now I'm wondering if the current economic conditions are an opportunity to clean house again.
I mean this in the nicest possible way, but not everyone should be a programmer. How often have you wished that a certain coworker of yours would suddenly have an epiphany one day and decide that this whole software engineering thing just isn't working out for them? How do you tell someone that the quality of their work is terrible and they'll never be good at their job -- so much so that they should literally quit and pursue a new career? I've wanted to many times, but I never had the guts.
Joel implied that good programmers love programming so much they'd do it for no pay at all. I won't go quite that far, but I will note that the best programmers I've known have all had a lifelong passion for what they do. There's no way a minor economic blip would ever convince them they should do anything else. No way. No how.
So if a programmer ever hints, even in passing, that they might possibly want to exit the field -- they probably should. I'm not saying you should be a jerk about it, obviously. But if someone has any doubt at all about programming as a career choice, they should be encouraged to explore alternatives -- and make room for another programmer who unashamedly loves to code.
Then again, maybe I'm not the best person to ask. I spent Christmas Eve setting up servers. I'm on holiday right now, sitting in a hotel room in Santa Barbara, and you know what I spent the last two nights doing until the wee hours of the morning? Writing code to improve Stack Overflow. Oh yeah, and this blog post.
So I might be a little biased.
December 27, 2008
My Software Is Being Pirated
If you're at all familiar with computer history, you might have heard of Bill Gates' famous 1976 letter to the Homebrew Computer Club. The letter was written to address rampant piracy of Bill's earliest product, Altair BASIC, which was being passed around quite freely by hobbyists in paper tape form, without any sort of payment to Microsoft (or, as it was then called, Micro-Soft).
Bill was understandably upset about this state of affairs.
It's an interesting figure: less than 10% of the "users" had actually purchased a copy; the other 90% had pirated it. Let's compare that statistic with a blog comment left November 12th by one of the authors of the critically acclaimed indie game World of Goo:
last we checked the piracy rate was about 90%.
32 years later, and we've ended up back exactly where we started. That's not exactly a resounding affirmation of the human spirit, or anything.
That 90% piracy figure was later substantiated in a blog post:
First, and most importantly, how we came up with this number: the game allows players to have their high scores reported to our server (it's an optional checkbox). We record each score and the IP from which it came. We divided the total number of sales we had from all sources by the total number of unique IPs in our database, and came up with about 0.1. That's how we came up with 90%.It's just an estimate though... there are factors that we couldn't account for that would make the actual piracy rate lower than our estimate:
- some people install the game on more than one machine
- most people have dynamic IP addresses that change from time to time
There are also factors that would make the actual piracy rate higher than our estimate:
- more than one installation behind the same router/firewall (would be common in an office environment)
- not everyone opts to have their scores submitted
For simplicity's sake, we just assumed those would balance out. So take the 90% as a rough estimate.
What makes this particularly depressing is that that World of Goo is not a game that deserves to be pirated. Not just because it's easily one of the best games of 2008 (and it really is -- please try the demo for Windows or Mac).
The crushing piracy rate is especially painful in this case because World of Goo was handcrafted by a tiny 2 man independent programming shop. Even a cursory 10 minute session is more than enough to demonstrate that this is a game built with love, not another commercial product extruded from the bowels of some faceless Activision-EA corporate game franchise sweatshop. Nor is this an exorbitantly priced bit of Adobe software that costs hundreds or thousands of dollars; it's a measly twenty bucks! Fifteen, if you count the fact that it's on sale right now via Steam. Oh, and did I mention that the developers explicitly chose to avoid any form of onerous copy protection?
Doesn't matter. 90% piracy rate. Just like Altair BASIC. And every other game.
Now, I'm no saint. I essentially grew up as a hardcore Apple // scene pirate, resolutely avoiding those public service announcements not to copy that floppy. I have a deep and personal understanding of the fact that not every person who pirates the software would ultimately buy it anyway. I was just a kid; I barely had money enough to have a computer at all. This is why the BSA's hypothetical piracy loss claims are more fantasy than anything else. Piracy is a natural state of affairs for users with lots of time and no money.
But it doesn't stay that way. Now that I'm older, I have money -- and a taste for software. I buy software all the time. I urge other people to buy software all the time. I've worked for companies that buy hundreds of thousands of dollars worth of software. I've even gone so far as proposing a Support Your Favorite Small Software Vendor Day, and I still try to live up to that goal. I have a budget set aside to buy some bit of software from a small development shop, each and every month. As programmers, we of all people should appreciate the message Bill Gates outlined in his original 1976 letter better than anyone else: buying software supports programmers.
But let me be absolutely crystal clear about one thing: as a programmer, if you write software and charge money for it, your software will be pirated. Guaranteed. Consider this recent example from the Joel on Software forums:
My software is being pirated.I have contacted with the forum where is the post with the crack and with the business that he requested (I called him) this crack. But they do not seem to want to collaborate. What I do?
How I can prevent future actions like this?
Now, the users can download a demo limited by days from my website and others' websites. I'm using Quick License Manager....
Short of ..
- selling custom hardware that is required to run your software, like the Playstation 3 or Wii
- writing a completely server-side application like World of Warcraft or Mint
.. you have no recourse. Software piracy is a fact of life, and there's very little you can do about it. The more DRM and anti-piracy devices you pile on, the more likely you are to harm and alienate your paying customers. Use a common third party protection system and it'll probably be cracked along with all the other customers of that system. Nobody wants to leave the front door to their house open, of course, but you should err on the side of simple protection whenever possible. Bear in mind that a certain percentage of the audience simply can't be reached; they'll never pay for your software at any price. Don't penalize the honest people to punish the incorrigible. As my friend Nathan Bowers so aptly noted:
Every time DRM prevents legitimate playback, a pirate gets his wings.
In fact, the most effective anti-piracy software development strategy is the simplest one of all:
- Have a great freaking product.
- Charge a fair price for it.
(Or, more radically, choose an open source business model where piracy is no longer a problem but a benefit -- the world's most efficient and viral software distribution network. But that's a topic for a very different blog post.)
Now, it's up to you to prove me right and revive my waning belief in the essential goodness of the human spirit by buying a copy of World of Goo, ideally directly from the developers.
Or you could, y'know, pirate it like everyone else.
December 24, 2008
Best (or Worst) Geek Christmas Ever
I was thrilled to discover that Santa Claus left a little unexpected present on my doorstep on Christmas Eve: the two Lenovo ThinkServers that I ordered for stackoverflow.com! They weren't supposed to arrive until sometime next week.
I immediately began unboxing the servers with all the eagerness of a kid unwrapping his Christmas presents. The servers are barebones, with basic levels of CPU and memory; I bought some hard drives and extra memory to have on hand for testing and installation. Configuring servers on Christmas Eve = the best geek christmas, ever!
Oooh. Just take a gander at all that hot, sweet server hardware.
After carefully unpacking everything and taking an inventory, my heart sank.
These Lenovo ThinkServers don't include any drive mounting brackets. Which means I can't install the hard drives. What's worse, there's no way to buy the drive brackets alone; you must purchase Lenovo's "server" hard drives if you want the mounting tray / bracket assembly. And Lenovo's drives start at $100 for a generic 160 GB SATA hard drive. That's a heck of a premium to pay for a drive tray. And I'd need eight of them. For comparison, I paid $80 each for a set of 500 GB SATA server class hard drives.
I had naively assumed that these servers would come with the necessary drive trays, just like they have slots for memory and CPU. Or at the very least the drive trays would be items I could purchase individually. In the case of the smaller 1U server, I can prop the bare SATA drives into position by placing a thin book under them, which is OK for test purposes, but hardly a long term solution for a server I need to ship to a data center.
It's amazing how quickly I went from best geek Christmas ever to worst geek Christmas ever. All for want of a few lousy, stinkin' hard drive trays! It's engendering some serious Nerd Rage.
I guess I'll be either returning these Lenovo ThinkServers, or selling them on Craigslist. How sad to see perfectly good hardware go to waste.
Update 1/11/09:
I bought two official $100 drive rails from Lenovo. Pity that they come with worthless 160 GB hard drives attached. Oh, and as an extra bonus "up yours" to customers, they use Torx screws.
Thanks to some eagle-eyed Coding Horror commenters (seriously, you guys rock), I also found an eBay seller with slightly older IBM SATA removable drive rails for sale at $25 each:
The older IBM drive rails work perfectly in the newer Lenovo servers, although the front design is ever so slightly cosmetically different. The model # is IBM 42R4131, and they're for the older IBM xSeries 3250, x306m, x3550, x3650, 3800, 3850 servers. So the good news is I only have to buy $250 worth of drive rails, instead of $1000 worth. Or at least that's what I'm telling myself..
December 23, 2008
Pressing the Software Turbo Button
Does anyone remember the Turbo Button from older IBM PC models?
A leftover from machines of five to ten years ago, the turbo switch still remains on many cases, even though it serves no purpose.In the early days of the PC, there was only IBM, and there were only a handful of different speeds a PC could run at. Early software was written by programmers who believed they were writing it to run on a machine of a specific speed. When newer, faster machines would come out, some of this software (especially games) would stop working properly because it would run too fast. Turning off the "turbo" function of the PC (which meant anything that made it run faster than an IBM of a particular era) would make the machine run slower so this software would work. In essence, it was a "compatibility mode" feature, to slow down the machine for older software.
![]()
Now there are dozens of different combinations of processor types and speeds. Software cannot rely on knowing the speed of the machine, so most programs use speed-detection algorithms. The turbo button no longer serves any useful purpose. On many motherboards there either isn't anywhere to connect it, or there is a place but the motherboard does nothing when you press the button. The best use for this button is to never touch it, or use it for some other purpose. Some older machines will still slow down when the button is pressed, and if you press it by accident your machine will lose performance. It can be surprisingly hard to track down the problem; the front of the machine is the last place anyone appears to notice anything.
I, too, remember the Turbo Button, although by the time I got heavily into PCs in the early 1990's it was already well in decline. The strange thing about a turbo button is that you'd pretty much always want it on; there's almost no situation where you'd want to keep some power in reserve for that extra "oomph" required by, say, a particularly intensive Lotus 1-2-3 spreadsheet. You wanted your PC to run at full tilt, maximum possible speed, all the time.
I think this hardware philosophy is also true of software, and it applies at both ends of software development:
- Developers need fast machines to be productive.
- Agile software development practices work best when you iterate rapidly.
- Users prefer software that's responsive.
Whether you're a coder or a user, performance is a hugely important feature, as the codist notes:
In my first job at a defense contractor, I met a couple guys (I thought they were old but they were probably my age now!) who had been writing code since the late 50's and then writing batch applications on an IBM mainframe. Since they could only compile/run once per day (and get the printouts the next day) they would work on 6-8 projects at the same time and weren't concerned when these projects might take years to complete. After two weeks on this I was ready to go insane and got switched to working on a supermini which at least had a realtime operating system. I could write code, compile it and run it at the same time. The only drawback was we had 7 people sharing one terminal at the start. Suggestions that each programmer get a terminal were laughed at initially. Being productive in such a limited time was really hard.After a couple years I switched to working on PCs (which were just out) and having my own "computer" was wonderful. Working in Pascal and assembly still wasn't fast yet but at least I had my own space.
Then I got Turbo Pascal and life was forever changed. I could write, compile and debug applications virtually instantly and my need for speed has never looked back. Even on the compared-to-today crappy hardware I never really found another environment as fast until I started using PHP this year (which of course has no compilation).
Later when I started Mac coding in C we started off with a dreadful C compiler/linker that took 20 minutes to do its thing. When Think-C came out it was almost a Turbo moment again. Eventually it began to get slower and slower and we swtich to Metrowerks Codewarrior which was fast but the applications were getting so big that it still took 30-60 seconds to build sometimes.
When I moved to Java in 1998 compiling and linking still took a fairly long time until the IDEs (and the JVM) began to catch up to the hardware. Still nothing was ever as instant as Turbo had been, despite the hardware being 100x faster.
I'm a speed freak too. As far as I'm concerned, everything on a PC should happen instantaneously. Or at least as close to instantaneous as the laws of physics will allow. Simply doing everything faster, all other things being equal, will allow you to get more done. I'm not alone in this; Fred Brooks made a similar observation way back in The Mythical Man-Month:
There is not yet much evidence available on the true fruitfulness of such apparently powerful tools. There is a widespread recognition that debugging is the hard and slow part of system programming, and slow turnaround is the bane of debugging. So the logic of interactive programming seems inexorable.Further, we hear good testimonies from many who have built little systems or parts of systems in this way. The only numbers I have seen for effects on programming of large systems were reported by John Harr of Bell Labs. [..] Harr's data suggest that an interactive facility at least doubles productivity in system programming.
Never underestimate the power of pressing the software turbo button. What's keeping you from going as fast as you can? As a user? As a software developer?
December 21, 2008
Gifts for Geeks: 2008 Edition, Sort Of
I was going to post another edition of Gifts for Geeks, as I did in 2006 and 2007, but my heart's just not in it this year. I don't know if it's the global economic apocalypse, or what, but I'm having a hard time mustering the required level of enthusiasm for buying more stuff. There are still some great ideas in the previous year's lists, and honestly it's always fun to window shop if nothing else. But I think this year I'm going for something more modest, more befitting of the barren Road Warrior hellscape that's apparently bearing down on all of us.
I was something of a skeptic on the road to high definition video, but I can point to one series that has completely defined the high definition experience for me, so much so that I think every human being on Earth should see it at least once: the Planet Earth BBC Series.
Filmed over five years, at a budget of 25 million dollars, and completely in high definition, the series is nothing less than revelatory. I'm in awe of the footage they captured, imagining how many months they must have spent to filming get a single moment of an Amur Leopard stalking across the siberian taiga, one of the last 30 in the world. And the series is full of moments like that. It's not just a showcase for amazing high definition video, but a riveting view into the life of our planet, too.
As far as I'm concerned, Planet Earth is the definitive high definition experience to date. Needless to say, highly recommended. I actually picked up the now-obsolete HD-DVD version of this series (they're encoded identically in VC-1, AC3), and then ripped it for convenience to my home theater PC, so I don't have to juggle four discs. There is a DVD version as well, but this is a series that demands to be seen in its full high definition glory.
For something a little less highbrow and more geeky, you can't go wrong with Absolutely Mad, a collection of fifty years of Mad magazine, every issue since 1952, on one DVD.
Do you know what famous computer scientist got his start in Mad magazine? No less than Donald Knuth himself, who wrote about the Potrzebie System of Weights and Measures in issue #33. Someone at Google is apparently a fan as well, as Google Calculator offers Potrzebie System conversions:
See? I told you. Geeky. And awesome.
I should also mention that the Absolutely Mad DVD is delightfully free of DRM or crazy custom viewing software. Each issue is a simple, unencumbered PDF file in a folder on the disc. The scans aren't nearly high resolution enough to be a reasonable substitute for the actual issues, or any of the recent full color reproductions. This is probably intentional. But it's more than adequate for browsing on a LCD screen, or, say, a mobile phone or netbook. Still, with 50 years of Mad for thirty measly bucks, it's hard to go wrong. But I am slightly biased -- I still subscribe to Mad. Who knew it took this much work to be stupid?
Hopefully by next year at this time, all this looming economic uncertainty will be behind us, and we'll be free to consume guilt-free once again. And if not, well, at least you can use these two shiny discs to blind your enemies.
December 18, 2008
Hardware is Cheap, Programmers are Expensive
Given the rapid advance of Moore's Law, when does it make sense to throw hardware at a programming problem? As a general rule, I'd say almost always.
Consider the average programmer salary here in the US:
You probably have several of these programmer guys or gals on staff. I can't speak to how much your servers may cost, or how many of them you may need. Or, maybe you don't need any -- perhaps all your code executes on your users' hardware, which is an entirely different scenario. Obviously, situations vary. But even the most rudimentary math will tell you that it'd take a massive hardware outlay to equal the yearly costs of even a modest five person programming team.
For example, I just bought two very powerful servers for Stack Overflow. Even after accounting for a third backup server and spare hard drives for the RAID arrays, my total outlay is around $5,000. These servers, compared to the ones we're on now, offer:
- roughly 50% more CPU speed
- 2 to 6 times the memory capacity
- almost twice the disk space (and it's a faster RAID 10 array)
I'd say that's a great deal. A no-brainer, even.
Incidentally, this is also why failing to outfit your (relatively) highly paid programmers with decent equipment as per the Programmer's Bill of Rights is such a colossal mistake. If a one-time investment of $4,000 on each programmer makes them merely 5% more productive, you'll break even after the first year. Every year after that you've made a profit. Also, having programmers who believe that their employers actually give a damn about them is probably a good business strategy for companies that actually want to be around five or ten years from now.
Clearly, hardware is cheap, and programmers are expensive. Whenever you're provided an opportunity to leverage that imbalance, it would be incredibly foolish not to.
Despite the enduring wonder of the yearly parade of newer, better hardware, we'd also do well to remember my all time favorite graph from Programming Pearls:
Everything is fast for small n. When n gets large, that's when things start to go sideways. The above graph of an ancient Trash-80 clobbering a semi-modern DEC Alpha is a sobering reminder that the fastest hardware in the world can't save you from bad code. More specifically, poorly chosen data structures or algorithms.
It won't hurt to run badly written code on the fastest possible boxes you can throw at it, of course. But if you want tangible performance improvements, you'll often have to buckle down and optimize the code, too. Patrick Smacchia's lessons learned from a real-world focus on performance is a great case study in optimization.
Patrick was able to improve nDepend analysis performance fourfold, and cut memory consumption in half. As predicted, most of this improvement was algorithmic in nature, but at least half of the overall improvement came from a variety of different optimization techniques. Patrick likens this to his early days writing demo scene code on the Commodore Amiga:
In the early 90s, I participated in the Amiga demo scene. It's a great illustration of the idea that there is always room for better performance. Every demo ran on the same hardware. It was the perfect incentive for demo developers to produce more and more optimized code. For several years, every month some record was beaten: the number of 3D polygons, the number of sprites, or the number of dots displayed simultaneously at the rate of 50 frames per second. Over a period of a few years, the performance factor obtained was around 50x! Imagine what it means to perform a computation in one second that originally took an entire minute. This massive gain was the result of both better algorithms (with many pre-computations and delegations to sub-chips) and micro-optimizations at assembly language level (better use of the chip registers, better use of the set of instructions).
Patrick achieved outstanding results, but let's be clear: optimizing your code is hard. And sometimes, dangerous. It is not something you undertake lightly, and you'd certainly want your most skilled programmers working on it. To put it in perspective, let's dredge up a few classic quotes.
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity."
-- W.A. Wulf
Programmers have a tendency to get lost in the details of optimizing for the sake of optimization, as I've noted before in Why Aren't My Optimizations Optimizing? and Micro-Optimization and Meatballs. If you're not extremely careful, you could end up spending a lot of very expensive development time with very little to show for it. Or, worse, you'll find yourself facing a slew of new, even more subtle bugs in your codebase.
That's why I recommend the following approach:
- Throw cheap, faster hardware at the performance problem.
- If the application now meets your performance goals, stop.
- Benchmark your code to identify specifically where the performance problems are.
- Analyze and optimize the areas that you identified in the previous step.
- If the application now meets your performance goals, stop.
- Go to step 1.
Always try to spend your way out of a performance problem first by throwing faster hardware at it. It'll often be a quicker and cheaper way to resolve immediate performance issues than attempting to code your way out of it. Longer term, of course, you'll do both. You'll eventually be forced to revisit those deeper algorithmic concerns and design issues with your code that prevent the application from running faster. And the advantage of doing this on new hardware is that you'll look like an even bigger hero when you deliver the double whammy of optimized code running on speedier hardware.
But until the day that Moore's Law completely gives out on us, one thing's for sure: hardware is cheap -- and programmers are expensive.
December 16, 2008
Avoiding The Uncanny Valley of User Interface
Are you familiar with the uncanny valley?
No, not that uncanny valley. Well, on second thought, yes, that uncanny valley.
In 1978, the Japanese roboticist Masahiro Mori noticed something interesting: The more humanlike his robots became, the more people were attracted to them, but only up to a point. If an android become too realistic and lifelike, suddenly people were repelled and disgusted.The problem, Mori realized, is in the nature of how we identify with robots. When an android, such as R2-D2 or C-3PO, barely looks human, we cut it a lot of slack. It seems cute. We don't care that it's only 50 percent humanlike. But when a robot becomes 99 percent lifelike-- so close that it's almost real-- we focus on the missing 1 percent. We notice the slightly slack skin, the absence of a truly human glitter in the eyes. The once-cute robot now looks like an animated corpse. Our warm feelings, which had been rising the more vivid the robot became, abruptly plunge downward. Mori called this plunge "the Uncanny Valley," the paradoxical point at which a simulation of life becomes so good it's bad.
This phenomenon has also been noted in cartoons.
McCloud's book Understanding Comics was the first place I ran into a concept which is a sort of corollary to the Uncanny Valley. Call it Lake Empathy: If a character is very simple, more iconic than realistic, it's much easier for people to pour themselves into it -- to view it not as a third party, but instead as a personal avatar.For example, you probably see more of yourself in the character to the left than in the characters to the right.
![]()
The seminal Understanding Comics was where I first encountered this concept, too. It's a sort of digital Zeno's Paradox. The more accurate your digital representation of a person, the more visible the subtle imperfections become. This is why computer generated people in recent movies like Polar Express feel even more unnatural than the highly abstract people in 1995's Toy Story. (The current state of the art, at least by some accounts, is The Emily Project. You be the judge.)
But does the uncanny valley effect apply to software user interfaces, too? Bill Higgins thinks it does.
The problem is that our minds have a model of how humans should behave and the pseudo-humans, whether robotic or computer-generated images, don't quite fit this model, producing a sense of unease - in other words, we know that something's not right - even if we can't precisely articulate what's wrong.There's a lesson here for software designers, and one that I've talked about recently -- we must ensure that we design our applications to remain consistent with the environment in which our software runs. In more concrete terms: a Windows application should look and feel like a Windows application, a Mac application should look and feel like a Mac application, and a web application should look and feel like a web application.
Bill extends this to web applications: a web app that apes the conventions of a desktop application is attempting to cross the uncanny valley of user interface design. This is a bad idea for all the same reasons; the tiny flaws and imperfections of the simulation will be grossly magnified for users. Consider the Zimbra web-based email that Bill refers to.
It's pretty obvious that their inspiration was Microsoft Outlook, a desktop application.
In my experience, shoehorning desktop conventions into web applications rarely ends well. I was never able to articulate exactly why, but the uncanny valley theory goes a long way towards explaining it:
If you're considering or actively building Ajax/RIA applications, you should consider the Uncanny Valley of user interface design. When you build a "desktop in the web browser"-style application, you're violating users' unwritten expectations of how a web application should look and behave. This choice may have significant negative impact on learnability, pleasantness of use, and adoption.
As I've mentioned before, one of the great strengths of web applications is that they aren't bound by the crusty old conventions of desktop applications. They're free to do things differently -- and hopefully better. Web applications should play to their strengths, instead of attempting to clone desktop applications.
If you end up anywhere near the uncanny valley of user interface, that sense of unease you feel is perfectly normal. You're clearly in the wrong place.
December 15, 2008
Easy, Efficient Hi-Def Video Playback
Ever since creating my first home theater PC, I've archived my Netflix rental DVDs to files on the hard drive. I don't do this because I want to rip off the movie industry; I do it for convenience. It's easier to deal with a collection of digital files than it is to deal with a bunch of shiny, easily scratched plastic discs. Nor do I keep the movies around after I watch them. I already own more movies than I could possibly ever watch in one lifetime. As I get older, my desire to collect things is rapidly diminishing. My ripping is purely about simplicity and ease of use for me, the consumer.
After years archiving DVDs on my home theater PC, I was concerned that the dawning Blu-Ray era would make this impossible. Fortunately, that's not the case. I experimented with AnyDVD HD and my first batch of rented Netflix Blu-Ray discs:
- Right click the SlySoft task bar icon; choose "Rip Video DVD to Harddisk"
- Choose a path (it will create a subfolder)
- Make sure you have at least 50 GB of free disk space
- Click "Copy DVD"
So brainlessly easy, even I can do it.
You'll end up with a folder containing all the subfolders and files that make up the Blu-Ray title. I'm not terribly interested in extras and so forth (did I mention that I don't have time?), I just want the movie itself. It's not hard to find. The movie file is in the folder:
/BDMV/STREAM/*.m2ts
Sort by file size, identify the biggest file, and that's your movie. Some movies are broken up into multiple files, but most of the ones I've done so far have been one giant honking file, somewhere between 8 and 20+ gigabytes in size. Rename and copy that one giant m2ts file wherever you want it, then delete all the other files.
Let's look at Terminator 3 as a specific example. (Digression: I don't understand why this movie gets such a bad rap. Sure, it's not a landmark film like T1 or T2, but it's a solid entry in the franchise, at least in my opinion.) Blu-Ray encompasses multiple video and audio encoding formats, so we need to crack open the file and see what's inside. I recommend using the most excellent MediaInfo application for this.
General Complete name : terminator3.m2ts Format : BDAV Format/Info : BluRay Video File size : 13.0 GiB Duration : 1h 49mn Overall bit rate : 17.1 Mbps Maximum Overall bit rate : 48.0 MbpsVideo Format : VC-1 Format profile : AP@L3 Duration : 1h 48mn Bit rate : 13.9 Mbps Width : 1920 pixels Height : 1080 pixels Display aspect ratio : 16/9 Frame rate : 23.976 fps Colorimetry : 4:2:0 Scan type : Progressive Bits/(Pixel*Frame) : 0.280
Audio (1 of 6) Format : AC-3 Format/Info : Audio Coding 3 Duration : 1h 49mn Bit rate mode : Constant Bit rate : 640 Kbps Channel(s) : 6 channels Channel positions : Front: L C R, Surround: L R, LFE Sampling rate : 48.0 KHz
I've clipped a lot of the extraneous information away, but the most important parts here are the encodings:
- Video is VC-1, 1920 x 1080, 13.9 Mbps average
- Audio is Dolby Digital AC-3, 6 channel, 640 Kbps
The ripping part has been straightforward; what I haven't been able to understand is why playback of 1920 x 1080 high definition files is so spotty on my current home theater PC:
- Gigabyte GA-MA78GPM-DS2H Micro ATX motherboard (highly recommended)
- AMD Athlon X2 4050e 2.1 GHz
- Windows Vista 32-bit SP1
- ffdshow all-in-one codec pack
Everything I've read led me to believe that any modern reasonably fast dual-core CPU is more than enough for high definition video playback. While that's generally true, some files are tougher than others. For example, taking advantage of my new multi-format drive, I picked up a cheap copy of the now-obsolete HD-DVD edition of Planet Earth - The Complete BBC Series. (Which is amazing, by the way -- it's probably the ultimate high definition demo disc, and the shows are fascinating to boot.) These files are also encoded with VC-1 but at a somewhat higher bitrate than Terminator 3.
Unfortunately, on a dual core Athlon -- even overclocked to 2.3 GHz -- the Planet Earth rips are on the ragged edge of playability under Windows Media Player. CPU usage is well north of 80% all the time, and some peaks at 100% mean video stuttering and sound breakup at least a few times in each episode. This is unacceptable.
After a great deal of research, I found Media Player Classic Home Cinema. The big deal here is two things:
- All codecs are "burned into" the Media Player Classic executable, so there's do dependency on whatever random codecs your PC happens to have installed (eg, ffdshow, cccp, Ivan's Krazy Elite Kodek Pak, etc).
- It supports offloading video decoding duties to modern video cards. This is limited to recent Radeon HD models and nVidia 8 and 9 series. Fortunately, my HTPC motherboard includes an embedded Radeon HD 3200 -- and since I blew up my old one (it's a long story) the new version I just installed includes 128 megabytes of dedicated DDR3 video memory, too.
Now, remember that Terminator 3 is encoded with VC-1, effectively a Microsoft video codec. Windows Media Player supports this natively. You'd expect it to perform great, since it's baked into the operating system, right?
Wrong. This isn't terrible performance, per se, but watch what happens when we play this same file using Media Player Classic Home Cinema, with hardware accelerated decoding enabled:
Holy cow. Using video hardware acceleration, we went from 75% CPU usage to 30% CPU usage. That's incredible. I knew modern video cards could assist in decoding high definition video, but I had no idea the difference was this profound.
But I want to play my movie files in Windows Vista Media Center, not a weird little standalone app. Here's the most awesome part of this post: you can!
As I discovered buried in an obscure forum post, here's how:
- download the standalone MPC-HC filters.
- Extract
MPCVideoDec.axand copy it intoc:\windows\system32\ - Open a command prompt, navigate to
c:\windows\system32\, and runregsvr32 MPCVideoDec.ax
Be sure you don't have any other video codecs registered, as the MPC-HC filter can handle everything. Once you register this magical codec, Windows Media Player (and thus, Windows Media Center) will use hardware accelerated high definition video playback. It's amazing. How amazing? Those Planet Earth rips, which used to take 80-100% of a mainstream dual core CPU, barely take 40% when using the hardware accelerated MPC-HC filters.
There is one caveat: for some reason, the MPC-HC filter doesn't accelerate the H.264 Blu-Ray encoding format out of the box. But it can, though. You'll need to use something like the Radlight Filter Manager to fix this. After launching it, navigate to the DirectShow filters part of the tree, then look for "MPC - Video decoder", and click the Property Page button.
On the Codecs tab, the only format not ticked for me was H.264/AVC. Tick that box and you're covered. You now have fully hardware accelerated playback for every possible Blu-Ray video encoding format. For free!
In my earlier attempts to solve this high definition video playback problem, I bought a copy of CoreAVC's "world's fastest H.264 software video decoder". And it was fast. Much faster than, say, the H.264 decoder included with ffdshow. My Casino Royale rip went from unplayable under ffdshow to eminently playable under CoreAVC, albeit at 80-90% CPU usage. I thought that was a great result until I saw the MPC-HC filter play that very same Casino Royale file at around 25% CPU usage. Zow. That's a night and day difference between "world's fastest" software and hardware accelerated H.264 decoding.
Now, if you have a very fast dual core CPU, or a moderately fast quad core CPU, you might be able to get away with pure software high definition video decoding (albeit at the cost of high CPU usage). But if, like me, you want to use a cheap, power-efficient dual core CPU to pull off high definition video playback, you'll need to properly harness the hardware decoding abilities of modern video cards. Media Player Classic Home Cinema is an excellent example of how this should work, and it's about the only one I could get to work.
