February 28, 2007
Choosing Anti-Anti-Virus Software
Now that Windows Vista has been available for almost a month, the comparative performance benchmarks are in.
- Windows XP vs. Vista: The Benchmark Rundown (Tom's Hardware)
- Windows Vista Performance Guide (Anandtech)
It's about what I expected; rough parity with the performance of Windows XP. Vista's a bit slower in some areas, and a bit faster in others. But shouldn't new operating systems perform better than old ones? There are plenty of low-level improvements under the hood. Why does Vista only break even in performance?
To be fair, Vista does a lot more than XP. I don't want to get into the whole XP vs. Vista argument here, but suffice it to say that the list of new features in Vista is quite extensive-- although perhaps not as extensive as some would like. Vista's integrated search alone is enough for me to banish XP from my life forever.
Microsoft has gotten a giant security shiner from Windows XP over the last five years. That's why Windows Vista goes out of its way to radically improve security, with new features like User Account Control (UAC) and Windows Defender. The existing security features in XP, such as Windows Firewall and System Protection (aka restore points) were significantly overhauled and improved for Vista, too. Enhanced security is a good thing, but it's never free. In fact, Vista's new security features will slow your PC down more than almost any other kind of software you can install.
For best performance, the first thing I do on any new Vista install is this:
- Turn off Windows Defender
- Turn off Windows Firewall
- Disable System Protection
- Disable UAC
I've had friends remark how "slow" Vista feels compared to XP, but when I ask them whether they've disabled Defender or UAC, the answer is typically no. Of course your system is going to be slower with all these added security checks. Security is expensive, and there ain't no such thing as a free lunch.
You might argue that three out of these four security features wouldn't even be necessary in the first place if Windows had originally followed the well-worn UNIX convention of separating standard users from privileged administrators. I won't disagree with you. But Windows' long historical precedent of setting user accounts up by default as privileged adminstrators is Microsoft's cross to bear. I can't rewrite history, and neither can Microsoft. That's why they came up with these painful, performance-sapping workarounds.
But this doesn't mean you have to give up on security entirely in the name of performance. If you're really serious about security, then create a new user account with non-Administrator privileges, and log in as that user. This isn't the default behavior in Vista, sadly. Post install, you get an Administrator-But-Not-Really-Just-Kidding account which triggers UAC on any action that requires administrator privileges. I'm sure this torturous hack was conceived in the name of backwards compatibility, but that doesn't mean we need to perpetuate it. The good news is that Vista is probably the first Microsoft operating system ever where you can actually work effectively as a standard, non-privileged user. As a standard user, you get all the benefits of UAC, Defender, and System Protection.. without all the performance drain.
Let me be clear here. I'm not against security. I'm against retrograde, band-aid, destroy all my computer's performance security.
Speaking of retrograde, band-aid, destroy all my computer's performance security, the one security feature Vista doesn't bundle is anti-virus software. And nothing cripples your PC's performance quite like anti-virus software. This isn't terribly surprising if you consider what anti-virus software has to do: examine every single byte of data that passes through your computer for evidence of malicious activity. But who needs theory when we have Oli at The PC Spy. Oli conducted a remarkably thorough investigation of the real world performance impact of security software on the PC. The results are truly eye-opening:
| Percent slower | |||
| Boot | CPU | Disk | |
| Norton Internet Security 2006 | 46% | 20% | 2369% |
| McAfee VirusScan Enterprise 8 | 7% | 20% | 2246% |
| Norton Internet Security 2007 | 45% | 8% | 1515% |
| Trend Micro PC-cillin AV 2006 | 2% | 0% | 1288% |
| ZoneAlarm ISS | 16% | 0% | 992% |
| Norton Antivirus 2002 | 11% | 8% | 658% |
| Windows Live OneCare | 11% | 8% | 512% |
| Webroot Spy Sweeper | 6% | 8% | 369% |
| Nod32 v2.5 | 7% | 8% | 177% |
| avast! 4.7 Home | 4% | 8% | 115% |
| Windows Defender | 5% | 8% | 54% |
| Panda Antivirus 2007 | 20% | 4% | 15% |
| AVG 7.1 Free | 15% | 0% | 19% |
The worst offenders are the anti-virus suites with real-time protection. According to these results, the latest Norton Internet Security degrades boot time by nearly 50 percent. And no, that isn't a typo in the disk column. It also makes all disk access sixteen times slower! Even the better performers in this table would have a profoundly negative impact on your PC's performance. Windows Defender, for example, "only" makes hard drive access 54 percent slower.
And yet, despite the crushing performance penalty, anti-virus software is de rigeur in the PC world. Most PC vendors would no sooner ship a PC without preinstalled anti-virus software than they would ship a PC without an operating system (yeah, you wish). The very thought of running a PC naked, vulnerable, unprotected from viruses sends system administrators screaming from the room in a panic. When you tell a sysadmin you dislike running anti-virus software, they'll look at you mouth agape, as if you've just told them that you hate puppies and flowers.
I don't see why they're so shocked. anti-virus software itself, while not self-propagating like a virus, certainly fits the definition of a Trojan Horse. Once installed on your system, it has a hidden, unadvertised payload: it decimates your computer's performance and your productivity. In my opinion, what we really need is Anti-Anti-Virus software to keep us safe from the ongoing Anti-Virus software pandemic.
I've never run any anti-virus software. And Mac or Linux (aka UNIX) users almost never run anti-virus software, either. Am I irresponsible to run all my computers without anti-virus software? Are Mac and Linux users irresponsible for not participating in the culture of fear that Windows anti-virus software vendors propagate? I think it's braver and more responsible to recognize that anti-virus software vendors are not only telling us to be afraid, they are selling us fear. The entire anti-virus software industry is predicated on a bad architectural decision made by Microsoft fifteen years ago. And why, exactly, would any of these vendors want to solve the virus problem and put themselves out of business?
I'll certainly agree that you can't stop users from clicking on dancing bunnies if they have their mind set on it. You should have a few different security layers in any modern operating system. But we should also be treating the disease first -- too many damn users running as administrators-- instead of the symptoms.
As for remediation strategies, I'm a fan of the virtual machine future. We should treat our operating system like a roll of paper towels. If you get something on it you don't like, you ball it up and throw it away, and rip off a new, fresh one. But if that's too radical for you, I think Jan Goyvaerts is on to something with good old plain common sense backups:
In fact, with a proper backup system in place, you don't have to be afraid of messing up your system. I don't use any anti-virus or anti-spyware software. If my system starts acting up, I'll restore the backup, and have a guaranteed clean system. No spyware remover can beat that. If I want to play with beta software, I don't have to inconvenience myself by running it in a virtual machine. I do use VMware for testing my applications on clean installs of Windows. But when beta testing new versions of tools I use for development, I want to test them in my actual development environment rather. When the beta expires, I wipe it off by restoring the OS backup.
It's not terribly different from my virtual machine solution. Either way, you go back to a known good checkpoint. And I'll take a backup strategy over a computer with hobbled performance any day.
This also begs the question of what safety really means. No matter how much security software you install, nagging users with dozens of security dialogs clearly doesn't make users any safer. We should give users a basic level of protection as standard non-adminstrator users. But beyond that, let users make mistakes, and provide automatic, unlimited undo. That's the ultimate safety blanket.
February 27, 2007
FizzBuzz: the Programmer's Stairway to Heaven
Evidently writing about the FizzBuzz problem on a programming blog results in a nigh-irresistible urge to code up a solution. The comments here, on Digg, and on Reddit-- nearly a thousand in total-- are filled with hastily coded solutions to FizzBuzz. Developers are nothing if not compulsive problem solvers.
It certainly wasn't my intention, but a large portion of the audience interpreted FizzBuzz as a challenge. I suppose it's like walking into Guitar Center and yelling 'most guitarists can't play Stairway to Heaven!'* You might be shooting for a rational discussion of Stairway to Heaven as a way to measure minimum levels of guitar competence.
But what you'll get, instead, is a blazing guitarpocalypse.
I'm invoking the Wayne's World rule here: Please, No Stairway to Heaven.
FizzBuzz was presented as the lowest level of comprehension required to illustrate adequacy. There's no glory to be had in writing code that establishes a minimum level of competency. Even if you can write it in five different languages or in under 50 bytes of code.
The whole point of the original article was to think about why we have to ask people to write FizzBuzz. The mechanical part of writing and solving FizzBuzz, however cleverly, is irrelevant. Any programmer who cares enough to read programming blogs is already far beyond such a simple problem. FizzBuzz isn't meant for us. It's the ones we can't reach-- the programmers who don't read anything-- that we're forced to give the FizzBuzz test to.
Good software developers, even the ones who think they are Rockstars, don't play Stairway to Heaven. And instead of writing FizzBuzz code, they should be thinking about ways to prevent us from needing FizzBuzz code in the first place.
* via Jon Galloway and Steven Burch.
February 26, 2007
Why Can't Programmers.. Program?
I was incredulous when I read this observation from Reginald Braithwaite:
Like me, the author is having trouble with the fact that 199 out of 200 applicants for every programming job can't write code at all. I repeat: they can't write any code whatsoever.
The author he's referring to is Imran, who is evidently turning away lots of programmers who can't write a simple program:
After a fair bit of trial and error I've discovered that people who struggle to code don't just struggle on big problems, or even smallish problems (i.e. write a implementation of a linked list). They struggle with tiny problems.So I set out to develop questions that can identify this kind of developer and came up with a class of questions I call "FizzBuzz Questions" named after a game children often play (or are made to play) in schools in the UK. An example of a Fizz-Buzz question is the following:
Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes. Want to know something scary? The majority of comp sci graduates can't. I've also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.
Dan Kegel had a similar experience hiring entry-level programmers:
A surprisingly large fraction of applicants, even those with masters' degrees and PhDs in computer science, fail during interviews when asked to carry out basic programming tasks. For example, I've personally interviewed graduates who can't answer "Write a loop that counts from 1 to 10" or "What's the number after F in hexadecimal?" Less trivially, I've interviewed many candidates who can't use recursion to solve a real problem. These are basic skills; anyone who lacks them probably hasn't done much programming.Speaking on behalf of software engineers who have to interview prospective new hires, I can safely say that we're tired of talking to candidates who can't program their way out of a paper bag. If you can successfully write a loop that goes from 1 to 10 in every language on your resume, can do simple arithmetic without a calculator, and can use recursion to solve a real problem, you're already ahead of the pack!
Between Reginald, Dan, and Imran, I'm starting to get a little worried. I'm more than willing to cut freshly minted software developers slack at the beginning of their career. Everybody has to start somewhere. But I am disturbed and appalled that any so-called programmer would apply for a job without being able to write the simplest of programs. That's a slap in the face to anyone who writes software for a living.
The vast divide between those who can program and those who cannot program is well known. I assumed anyone applying for a job as a programmer had already crossed this chasm. Apparently this is not a reasonable assumption to make. Apparently, FizzBuzz style screening is required to keep interviewers from wasting their time interviewing programmers who can't program.
Lest you think the FizzBuzz test is too easy-- and it is blindingly, intentionally easy-- a commenter to Imran's post notes its efficacy:
I'd hate interviewers to dismiss [the FizzBuzz] test as being too easy - in my experience it is genuinely astonishing how many candidates are incapable of the simplest programming tasks.
Maybe it's foolish to begin interviewing a programmer without looking at their code first. At Vertigo, we require a code sample before we even proceed to the phone interview stage. And our on-site interview includes a small coding exercise. Nothing difficult, mind you, just a basic exercise to go through the motions of building a small application in an hour or so. Although there have been one or two notable flame-outs, for the most part, this strategy has worked well for us. It lets us focus on actual software engineering in the interview without resorting to tedious puzzle questions.
It's a shame you have to do so much pre-screening to have the luxury of interviewing programmers who can actually program. It'd be funny if it wasn't so damn depressing. I'm no fan of certification, but it does make me wonder if Steve McConnell was on to something with all his talk of creating a true profession of software engineering.
Due to high volume, comments for this entry are now closed.
February 24, 2007
You Want a 10,000 RPM Boot Drive
I don't go out of my way to recommend building your own computer. I do it, but I'm an OCD-addled, pain-loving masochist. You're usually better off buying whatever cut-rate OEM box Dell is hawking at the moment, particularly now that Intel has finally abandoned the awful Pentium 4 CPU series and is back in the saddle with its excellent Core Duo processor. PC parts are so good these days it's difficult to make a bad choice, no matter what you buy.
If you really must build your own computer, sites like Tech Report provide excellent advice in the form of their system guides. However, their guide sets the bar a little too low for my tastes. There are a few baseline requirements for any new computer build that aren't negotiable for me:
- current dual core chip, such as the Core Duo 2 or Athlon 64 X2
- minimum of 2 GB of memory
- modern PCI express video card with 256mb or more of memory, such as the NVIDIA 7600GS, or the ATI Radeon X1650. Both of these cards can be found for about $100. Whatever you do, avoid on-board video, because it's universally crappy. The rule of thumb I use is this: if you're spending significantly less than $100 on your video card, you're making a terrible mistake.
It's not expensive. At today's prices, you're looking at around $800 for a new system based on these parts. Build that up and you've got a machine that can handle anything you throw at it, from cutting-edge games to full resolution high definition video playback. Oh yeah, and it compiles code pretty fast, too. If you're an avid gamer you might possibly want to throw another $50 to $100 at the video card for higher resolutions, but that's about it.
But one of the recommendations I make often gets some unexpected resistance. I believe every new PC build should have two hard drives:
- small 10,000 RPM boot drive
- large 7,200 RPM data/apps/games/media drive
I am a total convert to the Western Digital Raptor series of 10,000 RPM SATA hard drives. Maybe you're skeptical that a hard drive could make that much difference to a computer's performance. Well, I started out as a skeptic, too. But once I sat down and actually used a computer with a 10,000 RPM drive, my opinion did a complete about-face. I was blown away by how responsive and snappy it felt compared to my machine with a 7,200 RPM hard drive. It's a substantial difference that I continue to feel every day in typical use. Don't underestimate the impact of hard drive performance on your everyday use of the computer.
The difference in performance between a 7,200 RPM boot drive and a 10,000 RPM boot drive is not subtle in any way. But don't take my word for it. Surf the benchmarks yourself:
- StorageReview.com's review of the 150GB WD Raptor
- AnandTech's review of the 150GB WD Raptor
- TechReport's review of the 150GB WD Raptor
Unfortunately, the Raptors aren't large drives, and they're expensive on a per-megabyte basis. Current pricing is about $140 for the 74 GB model, and $180 for the 150 GB model. But once you factor in the incredible performance, and the idea that your don't need a lot of space on your primary drive because your secondary drive will be the large workhorse storage area, I think it's a completely reasonable tradeoff.
A number of people have expressed concerns that a 10,000 RPM drive will be run hot and noisy. I am a noise fanatic, and I can assure you that this is not the case. According to the StorageReview noise and heat analysis, the Raptor is squarely in the ballpark with its 7,200 RPM peers. I mount all my drives with sorbothane, and I use eggcrate foam on nearby surfaces to further reduce any reflected noise. Once I do this, the Raptor is no noisier than any other 3.5" desktop hard drive I've used.
Setting aside the performance argument for a moment, using two hard drives also provides additional flexibility. Although I cannot recommend RAID 0 on the desktop, there are clear benefits to using two standalone hard drives. You can isolate your essential user data from the operating system by storing it on the larger, secondary drive. This gives you the freedom to blow away your primary OS drive with relative impunity. It's also optimal for virtual machine use, as one drive can be dedicated to OS functions and the other can act exclusively as a virtual disk. There are plenty of usage scenarios where taking advantage of two hard drive spindles can provide a serious performance boost, such as extracting a large archive from one drive to another.
It's gotten to the point now where I won't even consider building a machine without a Raptor as the boot drive. Sure, your computer may have 2 or even 4 gigabytes of memory, but going to disk is inevitable. And every time you go to disk, you'll become thoroughly spoiled by the speed of the Raptor.
You may not know it yet, but you want a 10,000 RPM boot drive, too. In the words of Scott Hanselman: Go on. Treat yourself. I guarantee you won't be disappointed.
February 23, 2007
Revisiting 7-ZIP
In my previous post, I extolled the virtues of WinRAR and the RAR archive format. I disregarded 7-ZIP because it didn't do well in that particular compression study, and because my previous experiences with it had shown it to be efficient, but brutally slow.
But that's no longer true. Consider the following test I just conducted:
- Two files: a 587 MB virtual hard disk file, and a 11 KB virtual machine file.
- Test rig is a Dual Core Athlon X2 4800+.
- All default GUI settings were used.
- All extracting and archiving done from one physical hard drive to another, to reduce impact of disk contention.
| Extraction | Compression | Size | |
| WinRAR 3.70 beta 2 | 0:39 | 3:09 | 135 MB |
| 7-ZIP 4.20 | - | 6:04 | 127 MB |
| 7-ZIP 4.44 beta | 0:40 | 3:03 | 125 MB |
7-ZIP performance has doubled over the last two years. And it's slightly more efficient at compression, too. That's impressive.
Performance is no longer a reason to choose WinRAR over 7-ZIP. Granted, this is a sample size of one, a single test on a single machine, but it's hard to ignore the dramatic reversal of fortune.
I still like WinRAR's ultra-slick shell integration. But 7-ZIP is a viable competitor now in terms of raw clock time performance, and as always, it tends to produce smaller archives than RAR. This more than addresses my previous criticisms. Mea culpa, 7-ZIP.
February 22, 2007
Don't Use ZIP, Use RAR
When I wrote Today is "Support Your Favorite Small Software Vendor Day", I made a commitment to spend at least $20 per month supporting my fellow independent software developers. WinRAR has become increasingly essential to my toolkit over the last year, so this month, I'm buying a WinRAR license.
Sure, ZIP support is built into most operating systems, but the support is rudimentary at best. I particularly dislike the limited "compressed folder wizard" I get by default in XP and Vista. In contrast, WinRAR is full-featured, powerful, and integrates seamlessly with the shell. There's a reason WinRAR won the best archive tool roundup at DonationCoder. And WinRAR is very much a living, breathing piece of software. It's frequently updated with neat little feature bumps and useful additions; two I noticed over the last year were dual-core support and real-time stats while compressing, such as estimated compression ratio and predicted completion time.
WinRAR fully supports creating and extracting ZIP archives, so choosing WinRAR doesn't mean you'll be forced into using the RAR compression format. But you should use it, because RAR, as a compression format, clobbers ZIP. It produces much smaller archives in roughly the same time. If you're worried the person on the receiving end of the archive won't have a RAR client, you can create a self-extracting executable archive (or SFX) at a minimal cost of about 60 KB additional filesize.
RAR also supports solid archives, so it can exploit intra-file redundancies. ZIP does not. This is a big deal, because it can result in a substantially smaller archive when you're compressing a lot of files. When I compressed all the C# code snippets, the difference was enormous:
| ZIP | 229 KB |
| RAR | 73 KB |
But even in an apples-to-apples comparison, RAR offers some of the very best "bang for the byte" of all compression algorithms. Consider this recent, comprehensive multiple file compression benchmark. The author measured both compression size and compression time to produce an efficiency metric:
The most efficient (read: useful) program is calculated by multiplying the compression time (in seconds) it took to produce the archive with the power of the archive size divided by the lowest measured archive size.2 ^ (((Size/SmallestSize)) - 1) / 0.1) * ArchiveTime
The lower the score, the better. The basic idea is a compressor X has the same efficiency as compressor Y if X can compress twice as fast as Y and resulting archive size of X is 10% larger than size of Y.
And sure enough, if you sort the results by efficiency, WinRAR rises directly to the top. Its scores of 1871 (Good) and 1983 (Best) rank third and fourth out of 200. The top two spots are held by an archiver I've never heard of, SBC.
WinRAR and SBC 0.970 score very well on efficiency. Both SBC and WinRK are capable of compressing the 301 MB testset down to 82 MB [a ~73% compression ratio] in under 3 minutes. People looking for good (but not ultimate) and fast compression should have a look at those two programs.
The raw data on the comparison page is a little hard to parse, so I pulled the data into Excel and created some alternative views of it. Here's a graph of compression ratio versus time, sorted by compression ratio, for all compared archive programs:
What I wanted to illustrate with this graph is that beyond about 73% compression ratio, performance falls off a cliff. This is something I've noted before in previous compression studies. You don't just hit the point of diminishing returns in compression, you slam into it like a brick wall. That's why the time scale is logarithmic in the above graph. Look at the massive differences in time as you move toward the peak compression ratio:
| 72.58% | 02:54 | WinRAR 3.62 |
| 75.24% | 11:20 | UHARC 0.6b |
| 77.16% | 30:38 | DRUILCA 0.5 |
| 78.83% | 05:51:19 | PAQ8H |
| 79.70% | 08:30:03 | WinRK 3.0.3 |
Note that I cherry-picked the most efficient archivers out of this data, so this represents best case performance. Is an additional two percent of compression worth taking five times longer? Is an additional four percent worth ten times longer? Under the right conditions, possibly. But the penalty is severe, and the reward miniscule.
If you're interested in crunching the multiple file compression benchmark study data yourself, I converted it to a few different formats for your convenience:
- Download Excel spreadsheet (36 KB)
- Google Spreadsheet (view-only)
- Google Spreadsheet (editable, but need Google login)
Personally, I recommend the Excel version. I had major performance problems with the Google spreadsheet version.
After poring over this data, I'm more convinced than ever. RAR offers a nearly perfect blend of compression efficiency and speed across all modern compression formats. And WinRAR is an exemplary GUI implementation of RAR. It's almost a no-brainer. Except in cases where backwards compatibility trumps all other concerns, we should abandon the archaic ZIP format-- and switch to the power and flexibility of WinRAR.
February 21, 2007
URL Rewriting to Prevent Duplicate URLs
As a software developer, you may be familiar with the DRY principle: don't repeat yourself. It's absolute bedrock in software engineering, and it's covered beautifully in The Pragmatic Programmer, and even more succinctly in this brief IEEE software article (pdf). If you haven't committed this to heart by now, go read these links first. We'll wait.
Scott Hanselman recently found out the hard way that the DRY principle also applies to URLs. Consider the multiple ways you could get to this very page:
- http://codinghorror.com/blog/
- http://www.codinghorror.com/blog/
- http://www.codinghorror.com/blog/index.htm
It's even more problematic for Scott because he has two different domain names that reference the same content.
Having multiple URLs reference the same content is undesirable not only from a sanity check DRY perspective, but also because it lowers your PageRank. PageRank is calculated per-URL. If 50% of your incoming backlinks use one URL, and 50% use a different URL, you aren't getting the full PageRank benefit of those backlinks. The link juice is watered down and divvied up between the two different URLs instead of being concentrated into one of them.
So the moral of this story, if there is one, is to keep your URLs simple and standard. This is something the REST crowd has been preaching for years. You can't knock simplicity. Well, you can, but you'll be crushed by simplicity's overwhelming popularity eventually, so why fight it?
Normalizing your URLs isn't difficult if you take advantage of URL Rewriting. URL Rewriting has been a de-facto standard on Apache for years, but has yet to reach mainstream acceptance in Microsoft's IIS. I'm not even sure if IIS 7 supports URL Rewriting out of the box, although its new, highly modular architecture would make it very easy to add support. It's critical that Microsoft get a good reference implementation of an IIS7 URL rewriter out there, preferably one that's compatible with the vast, existing library of mod_rewrite rules.
But that doesn't help us today. If you're using IIS today, you have two good options for URL rewriting; they're both installable as ISAPI filters. I'll show samples for both, using a few common URL rewriting rules that I personally use on my website.
The first is ISAPI Rewrite. ISAPI Rewrite isn't quite free, but it's reasonably priced, and most importantly, it's nearly identical in syntax to the Apache mod_rewrite standard. It's also quite mature, as it's been through quite a few revisions by now.
[ISAPI_Rewrite] # fix missing slash on folders # note, this assumes we have no folders with periods! RewriteCond Host: (.*) RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [RP] # remove index pages from URLs RewriteRule (.*)/default.htm$ $1/ [I,RP] RewriteRule (.*)/default.aspx$ $1/ [I,RP] RewriteRule (.*)/index.htm$ $1/ [I,RP] RewriteRule (.*)/index.html$ $1/ [I,RP] # force proper www. prefix on all requests RewriteCond %HTTP_HOST ^test\.com [I] RewriteRule ^/(.*) http://www.test.com/$1 [RP] # only allow whitelisted referers to hotlink images RewriteCond Referer: (?!http://(?:www\.good\.com|www\.better\.com)).+ RewriteRule .*\.(?:gif|jpg|jpeg|png) /images/block.jpg [I,O]
The second option, Ionic's ISAPI Rewrite Filter, is completely free. This filter has improved considerably since the last time I looked at it, and it appears to be a viable choice now. However, it uses its own rewrite syntax that is similar to the Apache mod_rewrite standard, but different enough to require some rework.
# fix missing slash on folders
# note, this assumes we have no folders with periods!
RewriteRule (^[^.]+[^/]$) $1/ [I,RP]
# remove index pages from URLs
RewriteRule (.*)/default.htm$ $1/ [I,RP]
RewriteRule (.*)/default.aspx$ $1/ [I,RP]
RewriteRule (.*)/index.htm$ $1/ [I,RP]
RewriteRule (.*)/index.html$ $1/ [I,RP]
# force proper www. prefix on all requests
RewriteCond %{HTTP_HOST} ^test\.com [I]
RewriteRule ^/(.*) http://www.test.com/$1 [I,RP]
# only allow whitelisted referers to hotlink images
RewriteCond %{HTTP_REFERER} ^(?!HTTP_REFERER)
RewriteCond %{HTTP_REFERER} ^(?!http://www\.good\.com) [I]
RewriteCond %{HTTP_REFERER} ^(?!http://www\.better\.com) [I]
RewriteRule \.(?:gif|jpg|jpeg|png)$ /images/block.jpg [I,L]
The Ionic filter still has some quirks, but I loved its default logging capability. I could tell exactly what was happening with my rules, blow by blow, with a quick glance at the log file. However, I had a lot of difficulty getting the Ionic filter to install-- I could only get it to work in IIS 5.0 isolation mode, no matter what I tried. Clearly a work in progress, but a very promising one.
Of course, the few rewrite rules I presented above-- URL normalization and image hotlink prevention-- are merely the tip of the iceberg.
They don't call it the Swiss Army Knife of URL Manipulation for nothing. URL rewriting should be an integral part of every web developer's toolkit. It'll increase your DRYness, it'll increase your PageRank, and it's also central to the concept of REST.
February 20, 2007
Because They All Suck
The release of Windows Vista has caused an unfortunate resurgence in that eternal flame of computer religious wars, Mac vs. PC. Everywhere I go, somebody's explaining in impassioned tones why their pet platform is better than yours. It's all so tedious.
Personally, I had my fill of Mac versus PC arguments by 1994. I remember spending untold hours on the America Online forums endlessly debating the merits of PCs and Macs with Ross Rubin and other unsavory characters. But all that arguing never seemed to result in anything other than more arguments. Eventually, if you're more interested in using computers than endlessly arguing about them, you outgrow the arguments. And yet somehow, nearly fifteen years later, we're all happily retreading the same tired old Mac vs. PC ground.
I have a problem with this.
You might read Charles Petzold's ironically titled It Just Works as an anti-Mac diatribe. It certainly casts Apple in an unflattering light; Petzold's poor mother can't seem to catch a break.
Perhaps if my mother used lots of various Mac applications and stuck in lots of external devices, the machine would "just work" quite well. But she basically only uses email, so perhaps that's the problem. Just about every time I visit my mother in Jersey, I am called upon to boot up that dreadful machine and do something so it "just works" once again. For awhile she had a problem where certain spam emails would hang the email program upon viewing, but they couldn't be deleted without first being viewed. (Gosh, that was fun.) Presumably some patch to fix this little problem is among the 100 megabytes of updates waiting to be downloaded and installed, but my mother has a dial-up and we're forced to forego this 100 meg download. And besides, the slogan isn't "It just works with 100 megabytes of updates."
But if you read closely, as I did, you'll see that the experience wouldn't have been any better on a Windows PC. For a PC of that vintage, it's likely Petzold would have had to install the enormous Windows XP Service Pack 2 update to bring it up to date, which is certainly no less of a hassle than going from OS X 10.2 to OS X 10.4.
That's because Macs and PCs share one crucial flaw: they're both computers.
My computer frustrates and infuriates me on a daily basis, and it's been this way since I first laid my hands on a keyboard. Every computer I've ever owned-- including the ones with an Apple logo-- has been a colossal pain in the neck. Some slightly more so than others, but any device designed as a general purpose "do-everything" computing machine is destined to disappoint you eventually. It's inevitable.
The only truly sublime end-user experiences I've had have been with computers that weren't computers-- specialized devices, such as Tivo, the original Palm Pilot, the Nintendo Wii, and so forth.
General purpose computing devices are designed to be all things to all people. As a direct consequence, they will always be rife with compromises, pitfalls, and disappointments. That's the first secret of using computers: they all suck. Which makes the entire Mac vs. PC debate relative degrees of moot. I learned this lesson early in life; evidently some people are still struggling with it.
Computers do have one strong suit: they're unparalleled tools for writing, photography, programming, composing music, and creating art. It's the only reason to deal with the pain of owning one. As the Guardian's Charlie Brooker notes, the Mac vs. PC debate has an insidious side-effect that can distract you from this key benefit:
Ultimately the [Get a Mac advertising] campaign's biggest flaw is that it perpetuates the notion that consumers somehow "define themselves" with the technology they choose. If you truly believe you need to pick a mobile phone that "says something" about your personality, don't bother. You don't have a personality. A mental illness, maybe - but not a personality. Of course, that hasn't stopped me slagging off Mac owners with a series of sweeping generalisations for the past 900 words, but that is what the ads do to PCs. Besides, that's what we PC owners are like - unreliable, idiosyncratic and gleefully unfair. And if you'll excuse me now, I feel an unexpected crash coming.
That's the other problem with the Mac vs. PC debate: it completely misses the point. Computers aren't couture, they're screwdrivers. Your screwdriver rocks, and our screwdriver sucks. So what? They're screwdrivers. If you really want to convince us, stop talking about your screwdriver, and show us what you've created with it.
February 19, 2007
Everybody Loves BitTorrent
The traditional method of distributing large files is to put them on a central server. Each client then downloads the file directly from the server. It's a gratifyingly simple approach, but it doesn't scale. For every download, the server consumes bandwidth equal to the size of the file. You probably don't have enough bandwidth to serve a large file to a large audience, and even if you did, your bandwidth bill would go through the roof. The larger the file, the larger the audience, the worse your bandwidth problem gets. It's a popularity tax.
With BitTorrent, you also start by placing your large file on a central server. But once the downloading begins, something magical happens: as clients download the file, they share whatever parts of the file they have with each other. Clients can opportunistically connect with any other client to obtain multiple parts of the file at once. And it scales perfectly: as file size and audience size increases, the bandwidth of the BitTorrent distribution network also increases. Your server does less and less work with each connected client. It's an elegant, egalitarian way of sharing large files with large audiences.
BitTorrent radically shifts the economics of distribution. It's one of the most miraculous ideas ever conceived on the internet. As far as I'm concerned, there should be a Nobel prize for computing, and the inventor of BitTorrent should be its first recipient.
There's a great Processing visualization of BitTorrent in action which explains it far better than I can. The original visualization is not only down semi-permanently, but also written for an ancient version of Processing. I grabbed a cached copy of the code and updated it for the latest version of Processing.
This meager little animated GIF doesn't do the highly dynamic, real-time nature of the visualization justice. I highly recommend downloading Processing and downloading the updated bittorrent visualization code, so you can see the process from start to finish on your own machine. It's beautiful.
But as as wonderful and clever as BitTorrent is, it isn't perfect. As an avid BitTorrent user, I've noticed the following problems:
- BitTorrent is a terrible Long Tail client.
The efficiency of BitTorrent is predicated on popularity. The more people downloading, the larger the distribution network gets. But if what you want is obscure or unpopular-- part of the long tail-- BitTorrent is painfully, brutally slow. With only a handful of clients sharing the workload, you're better off using traditional distribution methods.
- BitTorrent, although distributed, is still centralized.
Download work is shared by the clients, but how do the clients locate each other? Traditionally this is done through a centralized server "tracker", or list of peers. This means BitTorrent is vulnerable to attacks on the centralized server. Once the server is out of commission, the clients have no way of locating each other, and the whole distribution network grinds to a halt. There are alternatives which allow clients to share the list of peers amongst themselves, such as distributed hash tables, but centralized tracking is more efficient.
Also, in order to even begin a BitTorrent download, you must first know where to obtain a .torrent file. It's a chicken-and-egg problem which also implies the existence of a centralized server out there somewhere.
- BitTorrent is unsuitable for small files, even if they are extremely popular.
The BitTorrent distribution network is predicated on clients sharing pieces of the file during the download period. But if the download period is small, the opportunity window for sharing is also small; at any given time only a few users will be downloading. This is another scenario where you're unlikely to find any peers, so you're better off with traditional distribution methods.
- BitTorrent relies on client altruism.
There's no rule that says clients must share bandwidth while they're downloading. Although most BitTorrent clients default to uploading the maximum amount a user's upstream connection allows, it's possible to dial the upload rate down to nothing if you're greedy. And some users may have their firewalls configured in such a way that they can't upload data, even if they wanted to. There's no way to punish bad peers for not sharing, or reward good peers for sharing more.
Furthermore, every torrent needs a "seed"-- a peer with 100% of the file downloaded-- connected at all times. If there is no seed, no matter how many peers you have, none of the peers will never be able to download the entire file. It's considered a courtesy to stay connected if you have 100% of the file downloaded and no other seeds are available. But again, this is a convention, not a requirement. It's entirely possible for a torrent to "die" when there are no seeds available.
The BitTorrent model is innovative, but it isn't suitable for every distribution task. The centralized server model is superior in most cases. But centralized distribution is a tool for the rich. Only highly profitable organizations can afford massive amounts of bandwidth. BitTorrent, in comparison, is highly democratic. BitTorrent gives the people whatever they want, whenever they want it-- by collectively leveraging the tiny trickle of upstream bandwidth doled out by most internet service providers.
But just because it's democratic doesn't mean BitTorrent has to be synonymous with intellectual piracy. BitTorrent has legitimate uses, such as distributing World of Warcraft patches. And Amazon's S3 directly supports the torrent protocol.
BitTorrent, in short, puts distribution choices back in the hands of the people. And that's why everybody loves BitTorrent. Everyone, that is, except the MPAA and RIAA.
February 16, 2007
Beyond JPEG
It's surprising that the venerable JPEG image compression standard, which dates back to 1986, is still the best we can do for photographic image compression. I can't remember when I encountered my first JPEG image, but JPEG didn't appear to enter practical use until the early 90's.
There's nothing wrong with JPEG. It's a perfectly serviceable image compression format. But there are newer, more modern choices these days. There's even a sequel of sorts to JPEG known as JPEG 2000. It's the logical heir to the JPEG throne.
The promise of JPEG 2000 is higher image quality in much smaller file sizes, at the minor cost of additional CPU time. And since we always seem to have a lot more CPU time than bandwidth, this is a perfect tradeoff. You may remember my comparsion of JPEG compression levels entry from last year. Let's see what happens when we take the two worst-looking images from that comparison-- the ones with JPEG compression factor 40 and 50-- and use JPEG 2000 to produce images of (nearly) the exact same size:
| JPEG, ~8,200 bytes | JPEG 2000, ~8,200 bytes |
| |
| JPEG, ~10,700 bytes | JPEG 2000, ~10,700 bytes |
|
No current web browsers can render JPEG 2000 (.jp2) images, so what you're seeing are extremely high quality JPEG versions of the JPEG 2000 images. Click on the images to download the actual JPEG 2000 files; most modern photo editing software can view them natively.
JPEG 2000 not only compresses more efficiently, it also does a better job of hiding its compression artifacts, too. It takes a lot more bits per pixel to create a JPEG image that looks as good as a JPEG 2000 image. But if you're willing to pump up the file size, you aren't losing any fidelity by presenting JPEG images.
Microsoft, as Microsoft is wont to do, offers a closed-source alternative to JPEG 2000 known as HD Photo or Windows Media Photo. As of late 2006, Microsoft made the format 100% royalty free, and support for HD Photo is included in Windows Vista and .NET Framework 3.0. According to this Russian study, Files in Microsoft's HD Photo format (.hdp, .wdp) are comparable to-- but not better than-- JPEG 2000. The study PDF has lots of comparison images, so you can decide for yourself.
Unfortunately, it doesn't really matter which next-generation image compression format is better, since nobody uses them. Microsoft neglected to include support for HD Photo in Internet Explorer 7. And Firefox doesn't currently support JPEG 2000, either. It's a bit of a mystery, because there's an seven year-old open bug on JPEG 2000, and the OpenJPEG library seems like a logical fit.
Until a commonly used web browser supports JPEG 2000 or HD Photo, there's no traction. I hope the next browser releases can move us beyond the ancient JPEG image compression format.



