July 31, 2006
Linus Torvalds, Visual Basic Fan
Stiff recently asked a few programmers a series of open-ended questions:
- How did you learn programming? Were schools of any use?
- What's the most important skill every programmer should have?
- Are math and physics important skills for a programmer?
- What will be the next big thing in computer programming?
- If you had three months to learn one relatively new technology, which one would you choose?
- What are your favorite tools and why?
- What's your favorite programming book?
- What's your favorite non-programming book?
- What music do you listen to?
The participants are all quite notable:
- Linus Torvalds (Linux)
- Dave Thomas (Pragmatic Programmer)
- David Heinemeier Hansson (Ruby/Rails)
- Steve Yegge (Google/Amazon)
- Peter Norvig (Google Research Director)
- Guido Van Rossum (Python)
- James Gosling (Java)
- Tim Bray (XML)
The interesting thing about open-ended questions is that the answers often reveal more about the person answering the question than they do about the question. Guido Van Rossum, for example, comes across as kind of a jerk. But the questions generally provoked some very thoughtful responses.
The most surprising response, however, was from Linus Torvalds. When asked what the "next big thing" would be in computer programming, here's part of his reply:
For example, I personally believe that Visual Basic did more for programming than Object-Oriented Languages did. Yet people laugh at VB and say it's a bad language, and they've been talking about OO languages for decades.And no, Visual Basic wasn't a great language, but I think the easy database interfaces in VB were fundamentally more important than object orientation is, for example.
Evidently we have another inductee into the he-man object hater's club.
Maybe the moral of this story is that we should value practical aspects of a language far more heavily than relatively meaningless technical merits. Or maybe I just get a kick out of hearing Linus Torvalds, the king of hard-core C geeks, compliment Visual Basic.
(via John Lam)
July 28, 2006
Are You an XML Bozo?
Here's a helpful article that documents some common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?
There seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.
- Don't think of XML as a text format
- Don't use text-based templates
- Don't
- Use an isolated serializer
- Use a tree or a stack (or an XML parser)
- Don't try to manage namespace declarations manually
- Use unescaped Unicode strings in memory
- Use UTF-8 (or UTF-16) for output
- Use NFC
- Don't expect software to look inside comments
- Don't rely on external entities on the Web
- Don't bother with CDATA sections
- Don't bother with escaping non-ASCII
- Avoid adding pretty-printing white space in character data
- Don't use
text/xml- Use XML 1.0
- Test with astral characters
- Test with forbidden control characters
- Test with broken UTF-*
I'm a little ambivalent about XML, largely due to what John Lam calls "The Angle Bracket Tax". I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath-- it's sort of like SQL with automatic, joinless parent-child relationships.
But XML is generally the least of all available evils, and if you're going to use it, you might as well follow the rules.
July 27, 2006
Windows XP, Our New Favorite Legacy Operating System
John Gruber gloats that Windows XP does not fare well in a comparison against OS X:
But everything about Boot Camp is calibrated to position Windows-on-Mac as the next Classic-style ghetto -- a compatibility layer that you might need but that you wish you didn't.Even the Boot Camp logo:
![]()
reinforces this. It's a bastardized variant of Microsoft's Windows logo, sans color, and with the whitespace between the four panels forming a hidden "X", la the hidden arrow in the FedEx logo.
[Microsoft is] stuck with the fact that in a fair shoot-out, Mac OS X is better. It looks better, it's better designed, it's more exciting, more intriguing, more satisfying. Cf. this joke from an anonymous poster in the comments at Mini-Microsoft's weblog:
What's the difference between OS X and Vista?Microsoft employees are excited about OS X…
What's conspicuously missing from this comparison is any mention of the fact that Windows XP was originally released in October 2001.
In the intervening five years, Apple's OS X has seen five major releases. If you squint your eyes, tilt your head, and look at it from a distance, perhaps you could consider Service Pack 2 a point release. But any way you slice it, Windows XP is going on five years old now. That's ancient. It's also the longest time Microsoft has ever gone between major releases of Windows.
Consider the minimum system requirements for Windows XP:
- 233 MHz processor
- 64 MB of RAM (128 MB recommended)
- Super VGA (800 x 600) display
- CD-ROM or DVD drive
- Keyboard and mouse
The cost of a license to Windows XP is-- quite literally-- more expensive than purchasing a PC that meets these minimum specs today.
What Gruber doesn't realize is that relegating Windows XP to "Classic" status isn't an insult. It's simply acknowledging what every Windows user already knows: Windows XP is a legacy operating system.
And there's no shame in it.
Look at the age of UNIX, which OS X is based on. In the same way that OS X is a modern remodelling of its BSD and Mach kernel origins, Windows Vista will be a much-needed modern renovation of the XP core.
But in the meantime, as the guys at Engadget recently said:
At this point we don't really know what to expect anymore, and since our current XP-powered setup already does everything we need it to, we're getting pretty close to not caring if Vista is ever released at all.
I'm perfectly content to use Windows XP "classic", as long as Windows Vista is on the horizon for 2007.
And there are other benefits to Windows XP's advanced age, too.
Since XP's minimum system requirements are absurdly low by today's standards, you'll have no problem running Windows XP-- even multiple instances of Windows XP-- in a virtual machine on a modern development PC. My optimized, fully-patched Windows XP SP2 Virtual Machine image is down to 587 megabytes. That's a mere 139 megabytes as a self-extracting RAR file.
Most apps run fine in Windows XP with 128 megabytes or 160 megabytes of memory. For example, here's a screenshot of IE 7, Beta 3. It's running in an optimized Windows XP virtual machine with only 128 megabytes of memory:
That's with four tabs open to ESPN, eBay, Yahoo news, and MSN. Even with all that going on, I have more than 20 megabytes of free memory. And my commit charge total is well under the physical memory total. There's still room for more stuff!
Clearly, if all you need to do is test IE7 beta 3 in a virtual machine, a humble developer machine with 512 megs of memory will work fine.* Of course, you still need to be careful if you don't have a gigabyte or more of system memory. There are more detailed guidelines at the Virtual PC guy blog.
Here's the complete Task Manager process list for this VM, if you're curious.
I see a few services that could be disabled to free up even more memory.
* however, if you're working at a job where developers are expected to work on machines with less than 1 gigabyte of memory, it's definitely time to start looking for a new job.
July 26, 2006
Compiler, It Hurts When I Do This
Here's a question that recently came up on an internal mailing list: how do I create an enum with a name that happens to be a c# keyword?
I immediately knew the answer for VB.net; you use brackets to delimit the word.
Public Enum test
[Public]
[Private]
End Enum
Sub Main()
Dim e As test = test.Private
End Sub
A little internet searching revealed that such things are called escaped identifiers, and the equivalent in c# is the @ character.
public enum test
{
@public,
@private
}
static void main()
{
test e = test.@private;
}
They do work the same, but they don't look the same. In c#, you have to type the unwanted escaped identifier every time you use the enum, and the enum even shows up with the @ prefix in intellisense. However, if you echo back the enum value, it will be "private", and not "@private", as expected.
However, after spending 30 minutes researching the answer and playing with the results, I began to wonder if the real answer to this question should be another question: why do you need to do this? At some point it all becomes a little ridiculous. What's next-- an enum named "enum"? A variable named "variable"?
Stop me if you've heard this one before:
A man goes to a doctor's office. He says, "Doctor, it hurts when I raise my arm over my head."The doctor replies, "Then don't raise your arm over your head."
If the compiler is telling you it hurts when you do something, maybe you should stop doing it. Just something to consider before merrily swimming your way upstream.
July 25, 2006
Information Density and Dr. Bronner
Edward Tufte, in his new book, Beautiful Evidence, continues on his crusade for information density. Here's a representative recap of a Tufte seminar from 2001:
Tufte spent most of his talk walking around the room while talking on a wireless mike. He had two projectors set up, but for the most part he only displayed pages or pictures from his books, instructing the audience to follow along in their own copies (which had been provided to every attendee). He occasionally carried around some other props, in particular a few 400-year old books from his personal library. This style not only entertained and engaged the audience, it also emphasized one of his main points, which is that progress is often measured in data density - how many bits per unit of area can be accomodated by a hard drive or a display.In terms of text display, a page in a phone book can hold 36K of information, while the best display can only show about 5K. If you look at something like a topographical map, the resolution available on paper is a factor of ten, at least, beyond what can be shown on a screen.
Tufte feels that the same mantra about data density should be applied to web sites, and in fact to the entire contents of the computer display that the user sees when navigating a web site. Thus, he dislikes task bars, menu bars, status bars, and other GUI screen overhead, since they constrict how much of the display can be used for content. Once you get to the actual site, he has similar disdain for banner ads, navigation bars, graphical frills, and the like.
Tufte feels that the main measure of a web site (or any computer interface) should be the percentage of the screen that is actually devoted to the task at hand. He wants web pages to use words instead of icons, because [words] can display information more compactly. He does not like navigation bars, but instead wants as many choices as possible on the main page.
You'll find the same theme repeated in all of Tufte's books: progress is measured in information density.
Although I definitely understand the desire for maximizing content and minimizing UI clutter, I have a hard time squaring the desire for maximum information density with the current Web 2.0 drive for minimalist content.
These days, you rarely see screens packed densely with content and hundreds of links, but that's what Tufte seems to be asking for. We even make fun of the Yahoo home page because it has become so dense over time. Are we wrong, and Tufte is right? Average display resolutions haven't increased that much between 1996 and 2006; we went from 800x600 to 1280x1024 or thereabouts. And we have the RGB magic of ClearType which increases effective horizontal resolution by about 3x.
Maybe the Yahoo home page design overreaches because it's now being designed as if it was a printed page. We have higher resolutions, sure, but computer displays are still nowhere near the resolution of a printed page. Perhaps the current trend of design minimalism is simply eliminating wishful thinking: mating the very low resolution of a computer screen (as compared to a printed page) with a corresponding reduction in content.
But Tufte isn't the only design guru to worship at the altar of information density. Jef Raskin, in The Humane Interface, talks about this at some length. He even references Tufte directly:
We seem to have a real fear of displaying data in our interfaces. We know that people can quickly find one among a few items much more quickly than they can find one among dozens: there is less to look through. But it does not follow, as some seem to think, that it is therefore better to have fewer items on each screen. If you have hundreds of items and split them up among dozens of screens, you lose more in navigation time than you gain in searching for the individual item, even if the one you seek is swimming in a sea of similar-looking items.Visual designer Edward Tufte's first three principles for displaying information are:
- Above all else, show the data.
- Maximize the data-ink ratio.
- Erase nondata ink.
All we need to do is substitute pixels for ink for his advice to apply to display-based devices. A serious, professional user wants screens packed with useful stuff. Screens should be well labeled, with methods to make finding things easier and dense with the information that represents the real value of each screen.
One of the most remarkable examples of information density, at least in a commercial product, is Dr. Bronner's soaps:
Click the image to see a larger version. You can also obtain PDF versions of the labels directly from the company website (scroll to the bottom).
I remember the first time I saw a Dr. Bronner product; the incredible density of the tiny text on the label drew me to it. Yes, they're filled with half-crazy religious ravings. Not so fun in person, but if someone is this jazzed about a bar of soap, it's somehow endearing. You can see a small video clip of Bronner ranting in person via the Dr. Bronner's Magic Soapbox documentary trailer.
You'd think a label filled with reams of tiny, indecipherable text would be the kiss of death for any commercial product. Not so for eccentric Dr. Bronner and his soaps. Is it a victory for information density? Maybe. I think Craigslist is conceptually pretty close to what Dr. Bronner was doing.
July 24, 2006
What is "Modern Software Development"?
Joel Spolsky came up with a twelve-item checklist in August, 2000 that provides a rough measure of-- in his words-- "how good a software team is":
- Do you use source control?
- Can you make a build in one step?
- Do you make daily builds?
- Do you have a bug database?
- Do you fix bugs before writing new code?
- Do you have an up-to-date schedule?
- Do you have a spec?
- A product specification
- A detailed user interface prototype
- A realistic schedule
- Explicit priorities
- Active risk management
- A quality assurance plan
- Detailed activity lists
- Software configuration management
- Software architecture
- An integration plan
These are great lists. But Spolsky's list is 6 years old; McConnell's is almost 10 years old! Does your software project meet all these criteria?
The lists are still highly relevant and definitely worth revisiting today. But I wonder if the field of software development has advanced far enough that we can take any of the items on this list for granted. I also wonder if any new practices have emerged in the last 6 years that aren't accounted for on either list.
So here's my question to you: what core set of practices constitutes modern software development in 2006?
July 21, 2006
The problem with "Low Priority"
I've always thought it was ironic that low priority emails are the ones I see first in my inbox.
Marking something with a low priority makes it stand out from all the others. Doesn't that make it implicitly high priority?
One man's urgent is another man's low priority, and vice-versa. I think it's better to leave the metadata to the machines and avoid marking emails with a priority of any kind, high or low.
July 20, 2006
I Pity The Fool Who Doesn't Write Unit Tests
J. Timothy King has a nice piece on the twelve benefits of writing unit tests first. Unfortunately, he seriously undermines his message by ending with this:
However, if you are one of the [coders who won't give up code-first], one of those curmudgeon coders who would rather be right than to design good software… Well, you truly have my pity.
Extending your pity to anyone who doesn't agree with you isn't exactly the most effective way to get your message across.
Consider Mr. T. He's been pitying fools since the early 80's, and the world is still awash in foolishness.
It's too bad, because the message is an important one. The general adoption of unit testing is one of the most fundamental advances in software development in the last 5 to 7 years.
How do you solve a software problem? How do they teach you to handle it in school? What's the first thing you do? You think about how to solve it. You ask, "What code will I write to generate a solution?" But that's backward. The first thing you should be doing -- In fact, this is what they say in school, too, though in my experience it's paid more lip-service than actual service -- The first thing you ask is not "What code will I write?" The first thing you ask is "How will I know that I've solved the problem?"We're taught to assume we already know how to tell whether our solution works. It's a non-question. Like indecency, we'll know it when we see it. We believe we don't actually need to think, before we write our code, about what it needs to do. This belief is so deeply ingrained, it's difficult for most of us to change.
King presents a list of 12 specific ways adopting a test-first mentality has helped him write better code:
- Unit tests prove that your code actually works
- You get a low-level regression-test suite
- You can improve the design without breaking it
- It's more fun to code with them than without
- They demonstrate concrete progress
- Unit tests are a form of sample code
- It forces you to plan before you code
- It reduces the cost of bugs
- It's even better than code inspections
- It virtually eliminates coder's block
- Unit tests make better designs
- It's faster than writing code without tests
Even if you only agree with a quarter of the items on that list-- and I'd say at least half of them are true in my experience-- that is a huge step forward for software developers. You'll get no argument from me on the overall importance of unit tests. I've increasingly come to believe that unit tests are so important that they should be a first-class language construct.
However, I think the test-first dogmatists tend to be a little too religious for their own good. Asking developers to fundamentally change the way they approach writing software overnight is asking a lot. Particularly if those developers have yet to write their first unit test. I don't think any software development shop is ready for test-first development until they've adopted unit testing as a standard methodology on every software project they undertake. Excessive religious fervor could sour them on the entire concept of unit testing.
And that's a shame, because any tests are better than zero tests. And isn't unit testing just a barely more formal way of doing the ad-hoc testing we've been doing all along? I think Fowler said it best:
Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead.
I encourage developers to see the value of unit testing; I urge them to get into the habit of writing structured tests alongside their code. That small change in mindset could eventually lead to bigger shifts like test-first development-- but you have to crawl before you can sprint.
July 19, 2006
Creating Smaller Virtual Machines
Now that Virtual PC is finally free, I've become obsessed with producing the smallest possible Windows XP Virtual PC image. It's quite a challenge, because a default XP install can eat up well over a gigabyte. Once you factor in the swapfile and other overhead, you're generally talking about around 2-4 gigabytes for relatively simple configurations.
My best result so far, however, is a 641 megabyte virtual machine image of a clean, fully patched Windows XP install. Not bad. And here's how I did it.
First, start with the obvious stuff:
- Install Windows XP SP2. Take all default options.
- Connect to Windows update; install all critical updates.
- Install VM additions.
- Turn off system restore.
- Right click My Computer; select properties
- Click the System Restore tab
- Click the "Turn off System Restore" checkbox
- OK all the way back out
- Set Visual Effects to minimum.
- Right click My Computer; select Properties
- Click the Advanced tab
- Click the Performance Settings button
- Click the "Adjust for best performance" checkbox
- OK all the way back out.
- Shut down.
Don't install anything else yet! Remember, we're trying to get to a minimal baseline install of Windows XP first. A nice, flat platform to build on.
It's critical to turn off system restore, because that eats up hundreds of megabytes of disk space. In a virtual machine environment, having a rollback path doesn't make sense anyway. And if the Windows software environment wasn't so pathological, we wouldn't need complex rollback support embedded in the OS, either, but I digress.
Now let's put together our toolkit of virtual machine optimization:
- XPlite ($)
- Crap Cleaner
- TweakUI
- Whitney Defragger
- Invirtus VM Optimizer ($, optional)
Thes utilities are mostly free. And, except for Crap Cleaner, they don't even require installers. Just plop all the files for each one into a folder; I call mine VM-utils. Copy this folder to the target VM.
- Use TweakUI to turn on automatic login. Otherwise you have to distribute login credentials with your VM, and who wants to do that?
- Now, use XPlite to tear out all the annoying, unnecessary bits of Windows XP:
XPlite is easily the best utility of its type; it removes scads of useless things built into XP that have no explicit uninstall mechanism. Unfortunately, XPlite is payware. There is a free version, but it's crippled; it can only remove a fraction of the items the full version can. See the full list of items it can remove along the right-hand side of the product page.
By default, XPlite generally shows things that are safe to remove. Note that the "Advanced Components" item is shown in that screenshot, which is definitely stuff that's not safe to remove unless you really know what you're doing. Anyway, here's what I consider totally safe to remove in XPlite's standard list:
- Accessibility Options
- Communication and Messaging
- Server Components
- Games
- System Services
- Accessories - you probably want Notepad, Calc, and the other essential applets. A world without Notepad is a world I don't want to live in.
- Internet Utilities - if you want to keep the default IE6 inside XP, I'd leave this alone. With the notable exception of MSN Explorer, which is always safe to drop.
- Multimedia - if you have sound enabled, selectively keep some of this, otherwise dump it all. It's highly unlikely you would ever want to watch videos or listen to music inside your VM, right? Right?
- Operating System Options - you may want to keep the core fonts if you're planning to browse the web within the VM. Also, beware of removing the service pack update files. Most of this is safe to dump, though. However, you will need the VB6 runtimes for Crap Cleaner to run!
- System Tools & Utilities - I'd leave Dr. Watson, and possibly PerfMon, WSH and Zip folder support.
- Install and run Crap Cleaner. Perform the default analysis, then do a cleanup. This step is really optional; it only cleans up a couple megabytes of log files and miscellaneous junk. Be sure to uninstall Crap Cleaner when you're done, too.
- Now that we've cleaned everything up, we need to defragment the disk.
You can use any defragmenter you like, of course, but this one is free and works quite well.
- Navigate to the folder where you put your VM utilities, including the Whitney Defragger.
- Open a command prompt
- Copy the defragmenting program to our windows system folder:
copy bootdfrg.exe c:\windows\system32\
- Install the defragmenting service:
defrag -i
- Schedule a defragmentation of the c:\ drive for the next boot:
defrag -d c: -B
- Restart the virtual machine.
- The defragmenter will run before Windows loads. Let it run to completion. It may take a little while, but it provides lots of textual feedback on what it's doing.
The others require a bit of judicious selection.
Once you've made your selections, let XPlite do its thing. It's worth the effort, because you'll have an unbelievably squeaky clean Start menu when it's done. Who knew Windows XP could be this.. simple?
Either way, you're mounting an ISO. The Microsoft Pre-Compactor is in a folder named "Virtual Machine Additions" under your Virtual PC install folder. Once mounted, the precompactor will autorun. Let it prep the drive; this doesn't take long.
Cleanly shut down the virtual machine.
- Click the File | Virtual Disk Wizard drop-down menu
- Edit an existing virtual disk
- Select the correct disk image
- Select "Compact it"
- Select "replacing the original file"
.. and prepare to marvel at the tiny size* of the resulting hard drive image!
It's really quite amazing how snappy and compact Windows XP can be, once you remove all the useless cruft from it.
* that's what she said.
July 18, 2006
Why Can't Database Tables Index Themselves?
Here's a thought question for today: why can't database tables index themselves?
Obviously, indexes are a central concept to databases and database performance. But horror tales still abound of naive developers who "forget" to index their tables, and encounter massive performance and scalability problems down the road as their tables grow. I've run into it personally, and I've read plenty of other sad tales of woe from other developers who have, too. I've also forgotten to build indexes myself on non primary key columns many times. Why aren't databases smart enough to automatically protect themselves from this?
It always struck me as absurd that I had to go in and manually mark fields in a table to be indexed. Perhaps in the bad old file-based days of FoxPro, DBase, and Access, that might have been a necessary evil. But in a modern client-server database, the server should be aware of all the queries flowing through the system, and how much each of those queries cost. Who better to decide what needs to be indexed than the database itself?
Why can't you enable an automatic indexing mode on your database server that follows some basic rules, such as..
- Does this query result in a table scan?
- If so, determine which field(s) could be indexed, for that particular query, to remove the need for a table scan.
- Store the potential index in a list. If the potential index already exists in the list, bump its priority.
- After (some configurable threshold), build the most commonly needed potential index on the target table.
Of course, for database gurus who are uncomfortable with this, the feature could be disabled. And you could certainly add more rules to make it more robust. But for most database users, it should be enabled by default; an auto-indexing feature would make most database installations almost completely self-tuning with no work at all on their part.
I did some cursory web searches and I didn't see any features like this for any commercial database server. What am I missing here? Why does this seem so obvious, and yet it's not out there?
