April 30, 2005
Barcodes and QR Codes
I recently purchased a USB CueCat from eBay to play around with UPC barcodes, which I found out about from comments posted in a Scott Hanselman blog entry. It's fun to run around the house scanning in UPCs from household items, although the low-powered LED reader in the CueCat definitely pales in comparison to the industrial laser readers you'll find at your local supermarket. Still, you can't beat it for $15, and the PS2 version can be had for even less. If you're wondering why exactly you would want to do this, check out Delicious Library (review). Like so many things Apple, it's self-consciously cute where it should be practical, but the concept is sound.
I saw a reference in Ned Batchelder's blog to UPCs on steroids: something called QRCode. QRCode is designed to be "scanned" via cell phone cameras, and it's the most information-dense 2d bar code format currently available:
- 7,089 numeric characters
- 4,296 alphanumeric characters
- 2,953 bytes
- 1,817 Kanji
It's a shame American cell phones and American advertisers haven't adopted QRCode. However, it may be a preview of things to come as cameras become a standard feature of cell phones.
* If you have Windows Media Player 9 or higher, you can view the WMP9 movie version of this app capture, which uses the screen capture codec introduced in WMP9: it's 50% smaller than the animated GIF and offers higher quality too!
April 29, 2005
Respecting Abstraction
In a recent post, Scott Koon proposes that to be a really good .NET programmer, you also need to be a really good C++ programmer:
If you've spent all your life working in a GC'ed language, why would you ever need to know how memory management works, let alone virtual memory? Jon also says it doesn't teach you to work with a framework. What's the STL? What about MFC? ATL? Carbon? All of those things use C++ as their base language. Notice I didn't say to take a C/C++ course at a university as I'm not convinced that a CS course will teach you everything you need to know in the real world. I said to learn C/C++ first because if you understand HOW things work, you'll have a better idea of how things DON'T work. How can you identify a memory leak in your managed application if you don't know how memory leaks come about or what a memory leak is?
The problem I have with this position is that it breaks abstraction. The .NET framework abstracts away the details of pointers and memory management by design, to make development easier. But it's also a stretch to say that .NET developers have no idea what memory leaks are-- in the world of managed code, memory management is an optimization, not a requirement. It's an important distinction. You should only care when it makes sense to do so, whereas in C++ you are forced to worry about the minutia of detailed memory management even for the most trivial of applications. And if you get it wrong, either your app crashes, or you're open to buffer overrun exploits.
You may also be familiar with Joel's article on the negative effects of leaky abstractions. Bram has a compelling response:
Joel Spolsky saysAll non-trivial abstractions, to some degree, are leaky.This is overly dogmatic - for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.
It's amazing how far down the rabbit hole you can go following the many abstractions that we routinely rely on today. Eric Sink documents the 46 layers of abstraction that his product relies on. And Eric stops before we get to the real iron; Charles Petzold's excellent book Code: The Hidden Language of Computer Hardware and Software goes even deeper. In other words, when Joel says:
Today, to work on CityDesk, I need to know Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. All high level tools compared to the old K&R stuff, but I still have to know the K&R stuff or I'm toast.
What he's really saying is without these abstractions, we'd all be toast. While no abstraction is perfect-- you may need to dip your toes into layers below the Framework from time to time-- arguing that you must have detailed knowledge of the layer under the abstraction to be competent is counterproductive. While I don't deny that knowledge of the layers is critical for troubleshooting, we should respect the abstractions and spend most of our efforts fixing the leaks instead of bypassing them.
April 27, 2005
When Writing Code Means You've Failed
I was chatting with a fellow developer yesterday, who recently adopted the very cool Busy Box ASP.NET progress indicator that I recommended:
We often need to provide a user message informing the user that their request is "processing". Like the hour-glass mouse pointer lets the Windows user know the system is busy processing their last request, I have a simple, clean, and effect solution to providing this on web pages: The BusyBox Demo
He was quite pleased with the results, as their app has to churn through some HR queries that take in excess of 30 seconds even after hand optimization. The psychological effect of a progress indicator is quite profound:
In cases where the computer cannot provide fairly immediate response, continuous feedback should be provided to the user in form of a percent-done indicator [Myers 1985]. As a rule of thumb, percent-done progress indicators should be used for operations taking more than about 10 seconds. Progress indicators have three main advantages: They reassure the user that the system has not crashed but is working on his or her problem; they indicate approximately how long the user can be expected to wait, thus allowing the user to do other activities during long waits; and they finally provide something for the user to look at, thus making the wait less painful. This latter advantage should not be underestimated and is one reason for recommending a graphic progress bar instead of just stating the expected remaining time in numbers.
My Busy Box recommendation came after that team made several abortive attempts to implement different kinds of progress feedback. And this got me thinking: sometimes, writing code means you've failed. So much of what we do already exists, and in more mature, complete form. The real challenge in modern programming isn't sitting down and writing a ton of code; it's figuring out what existing code or frameworks you should be hooking together. This is something Scott Swigart has also observed:
Venkatarangan points out all the stuff that Sauce Reader uses, showing that in software development today, 1/2 the work is finding the building blocks, and the other 1/2 is writing the glue.
The real development skill is correctly identifying which half is legos and which half is glue.
April 26, 2005
Give me parameterized SQL, or give me death
I have fairly strong feelings when it comes to the stored procedures versus dynamic SQL argument, but one thing is clear: you should never, ever use concatenated SQL strings in your applications. Give me parameterized SQL, or give me death. There are two good reasons you should never do this.
First, consider this naive concatenated SQL:
SELECT email, passwd, login_id, full_name
FROM members
WHERE email = 'x';
Code like this opens your app to SQL injection attacks, and it's a huge, gaping vulnerability. Steve Friedl's SQL Injection Attacks by Example provides an excellent visual blow-by-blow of what can happen when you write code this naive. Here's the Reader's Digest version:
SELECT email, passwd, login_id, full_name
FROM members
WHERE email = 'x' OR full_name LIKE '%Bob%';
I know what you're thinking. No, escaping the strings doesn't protect you; see Steve's article.
Second, parameterized SQL performs better. A lot better. Consider the parameterized version of the above:
SqlConnection conn = new SqlConnection(_connectionString);
conn.Open();
string s = "SELECT email, passwd, login_id, full_name " +
"FROM members WHERE email = @email";
SqlCommand cmd = new SqlCommand(s);
cmd.Parameters.Add("@email", email);
SqlDataReader reader = cmd.ExecuteReader();
This code offers the following pure performance benefits:
- Fewer string concatenations
- No need to worry about any kind of manual string escaping
- A more generic query form is presented to db, so it's likely already hashed and stored as a pre-compiled execution plan
- Smaller strings are sent across the wire
Non-parameterized SQL is the GoTo statement of database programming. Don't do it, and make sure your coworkers don't either.
April 25, 2005
Canonicalization: Not Just for Popes
You may remember the ASP.NET canonicalization vulnerability from last year. And what exactly is canonicalization? From Microsoft's Design Guidelines for Secure Web Applications:
Data in canonical form is in its most standard or simplest form. Canonicalization is the process of converting data to its canonical form. File paths and URLs are particularly prone to canonicalization issues and many well-known exploits are a direct result of canonicalization bugs. For example, consider the following string that contains a file and path in its canonical form.
c:\temp\somefile.datThe following strings could also represent the same file.
somefile.dat c:\temp\subdir\..\somefile.dat c:\ temp\ somefile.dat ..\somefile.dat c%3A%5Ctemp%5Csubdir%5C%2E%2E%5Csomefile.datIn the last example, characters have been specified in hexadecimal form:
- %3A is the colon character.
- %5C is the backslash character.
- %2E is the dot character.
You should generally try to avoid designing applications that accept input file names from the user to avoid canonicalization issues. Consider alternative designs instead. For example, let the application determine the file name for the user. If you do need to accept input file names, make sure they are strictly formed before making security decisions such as granting or denying access to the specified file.
Seems straightforward enough; there can be only one true representation of the data, just like there's only one Pope. And popes don't canonicalize: they canonize. Which means the words "canonicalize" and "canonicalization" are artificially fabricated technical mumbo-jumbo. As if we didn't have enough of that to go around already:
We are asking for your help in eradicating words that have been invented for no good reason. Sometimes, it's too late to do anything about them. Look at the word "canonicalize," for instance. It is used to mean "to create the canonical form" of something, like a URL (as in InternetCanonicalizeUrl from the WinINet API). It's not English; it was invented because someone didn't know that there was already a perfectly adequate word for this process: "canonize." However, once this non-word has been created, the rules of the language suddenly apply again, so the process of "canonicalizing" something is "canonicalization" instead of "canonization."More recently, we've seen the word "performant" start its crawl into the everyday vocabulary of devspace. It is used to mean "highly performing." It's also not a word. When something provides information, it's informative. It's not "informant." The word "performant," if it existed, would be a noun -- not an adjective. But it doesn't exist, so if you do see it in print, remember that it's not really there.
Any readers who have made it this far are probably rolling their eyes now, thinking to themselves, "Why are they being such sticklers here? Isn't the language a wonderful, evolving thing?" Yes, our language is evolving. As there is a need for new words, new words enter the language. But making up new words is just as bad as using fancy words in place of short ones. Why say "This project's goals are orthogonal to the company's needs"? Admit it -- if you were at home, you'd just say "different from" or "at odds with."
It's one thing to use technical jargon excessively, but the perpetuation of jargon for jargon's sake is particularly Orwellian. Along those same lines, you may also be interested in Cyrus' list of commitments. Is it clear? As an unmuddied lake, sir. As clear as an azure sky of deepest summer.
April 24, 2005
The Start Menu must be stopped
As I struggle to open applications on my PC, I was reminded of a few entries in Scott Hanselman's blog:
Personally I have enough crap in my start menu to fill my 1400x1060 screen...arguably only 30% of the icons represent applications, the rest are just flotsam. (May 11, 2003)As I sit here and look at my Start Menu, that fills my 1600x1200 screen and runs off the right edge... (October 10, 2003)
Anyway, I'm about 40% done installing my programs as you can see by my Start Menu. I'll know I'm done when the Start Menu completely fills my 1400x1050 screen. (December 7, 2004)
I'm not picking on Scott here. I just happened to notice a theme in his posts that jibed with my personal experience. The Windows start menu makes launching applications far more difficult than it should be. A giant horizontal menu may have seemed like a good idea back when Windows 95 was launched-- but clearly, it isn't. I curse every time I have to launch an app that isn't pinned to my start menu, or in the recently launched program list:
- The list is not in alphabetical order by default. It's in install order. You can manually sort it by right clicking the list and selecting Sort.
- Software vendors tend to put their applications in folders using the name of their company. So if I want to use CrazyApp, I have to remember to look for the MonkeyCorp menu. Why should we expect the user to know or care what the company name is?
- Some items don't go into folders. These items show up at the bottom of the list. If you're looking for Word under the Microsoft folder, or the Office folder, you're out of luck. It's set up with its own icon at the bottom of the list. And why the bottom? I have no idea.
- Some items show up at the top of the list for no obvious reason. That's because those items are set up for All Users. Again-- how is a user supposed to know this?
- Once you have more than one "row" of applications, cascading folders that pop up from the left row obscure the information in the right row. This design clearly doesn't scale.
- If you want to rearrange the list of applications, you can do so by dragging the items in the menus. However, this is incredibly difficult to do within a series of cascading menus. Try dragging an item from within a folder to another folder, for example. Or right-clicking an item to delete it, which sometimes results in the entire start menu closing.
The deeper problem with the start menu is that it's, well, a menu. Menus have poor usability. A single "Start" point for the user is a fine idea, but it really starts to break down when you make it into more than a simple, visible list of items, as the All Programs link does. A 2003 usability study showed that all menus are inferior to Yahoo-style index lists, and horizontal menus have the worst usability of all:
The poorest performer, both objectively and subjectively, was the Horizontal layout. Participants in this condition took longer to find the task information, and they had the opinion, though non-significantly, that this layout was more disorientating than the other two layouts. It is possible that the distance this layout was from the center of the screen contributed to its poorer participant performance. In fact, one participant commented that this layout "was more difficult to see and reach than the others because of its height on the screen."
It's clear that traditional menus have no place on web pages, and should be used sparingly in GUIs. And that's the critical problem with the Start Menu: it abuses menus. For launching applications, it's a usability trainwreck. I'm sure they'll be fixing this with Vista [and they do], but in the meantime, what's a poor computer user to do?
April 22, 2005
You Can Write FORTRAN in any Language
A recent user-submitted CodeProject article took an interesting perspective on the VB.NET/C# divide by proposing that the culture of Visual Basic is not conducive to professional software development:
We've seen that the cultures of VB and C# are very different. And we've seen that this is no fault of the programmers that use them. Rather this is a product of the combination of factors that collectively could be called their upbringing -- business environment, target market, integrity and background of the original language developers, and a myriad other factors.One factor, however, that seems to have a greater effect on the culture than others, is the syntax and semantics of the language. To what extent do syntax and semantics play a part in the culture that builds up around a language and to what extent, vice versa, do the syntax and semantics depend on the culture in which the language was created? The truth is, both -- just as spoken languages both grow out of culture and influence culture. For instance, in the far north the language syntax has evolved several words for the different types of snow. Interactions then use the language to express nuances of snow, creating a more snow-centric culture.
So in Visual Basic, the decision to include in the syntax and semantics the ability to assign numbers directly to strings and vice versa was a result of the designers' desire to attract a broad base of developers who would probably not understand the notions of strongly typed variables. Once the syntax permitted it, such assignment became widespread, reinforcing the designers' original premise. Once this cycle of self-reinforcement begins, the cultural habits quickly become entrenched and widespread, and are extremely resistant to change. Minds tend to gravitate to like minds. User groups tend to attract homogenous followings. Visual Basic instructors tend to propagate what their instructors taught them.
While I appreciate the idea that the culture around a language can influence you, the implication that choosing the "wrong" language can somehow cripple your professional development is disturbing. This concept is known in linguistic circles as the Sapir-Whorf hypothesis. It proposes that the vocabulary and syntax of our language guide and limit the way we see the world: form dictates content. Edsger Dijkstra, for example, believed that programming in Fortran or Basic not only condemned us to produce bad code, it corrupted us for life.
The author also offers a few predictions:
In the near future, there will be less good VB programmers than C# programmers. This is because many of the good VB programmers are switching to C#. This is partly because they like the language better, but mostly because they like the culture better. As the cultural separation becomes more evident and self-reinforcing, it will accelerate until there are very few good VB programmers left.
I'm hesitant to dismiss this article outright because I have observed first hand the mass migration of VB developers to C#, and in my experience the early adopters do tend to be the better developers. However, I cannot agree that code quality is predestined by choice of language, environment, or IDE-- it's almost entirely determined by the skill of the developer. Ergo, you can write FORTRAN in any language:
There are characteristics of good coding that transcend all general-purpose programming languages. You can implement good design and transparent style in almost any code, if you apply yourself to it. Just because a programming language allows you to write bad code doesn't mean that you have to do it. And a programming language that has been engineered to promote good style and design can still be used to write terrible code if the coder is sufficiently creative. You can drown in a bathtub with an inch of water in it, and you can easily write a completely unreadable and unmaintainable program in a language with no gotos or line numbers, with exception handling and generic types and garbage collection.
I agree that cultural factors are significant, however, individual developer skill is a far more accurate predictor of success than whether or not you chose the "cool" language. Like Java in its early days, the shiny patina of newness surrounding C# is attracting a disproportionate number of talented developers. Today, any Java-related google query will return reams of truly mediocre "explosion at the Pattern Factory" Java code. All I can say is, enjoy it while it lasts.
April 21, 2005
The bloated world of Managed Code
Mark Russinovich recently posted a blog entry bemoaning the bloated footprint of managed .NET apps compared to their unmanaged equivalents. He starts by comparing a trivial managed implemention of Notepad to the one that ships with Windows:
First notice the total CPU time consumed by each process. Remember, all I've done is launch the programs – I haven't interacted with either one of them. The managed Notepad has taken twice the CPU time as the native one to start. Granted, a tenth of a second isn't large in absolute terms, but it represents 200 million cycles on the 2 GHz processor that they are running on. Next notice the memory consumption, which is really where the managed code problem is apparent. The managed Notepad has consumed close to 8 MB of private virtual memory (memory that can't be shared with other processes) whereas the native version has used less than 1 MB. That's a 10x difference! And the peak working set, which is the maximum amount of physical memory Windows has assigned a process, is almost 9 MB for the managed version and 3 MB for the unmanaged version, close to a 3x difference.
While Mark has more coding skill in his pinky finger than I have in my entire body, I think his comparison is misleading at best and specious at worst. He clarifies his position in a subsequent post:
Memory footprint is much more important for a client-side-only application since there can be many such applications running concurrently and clients often have limited memory. Someone stated that by the time that Longhorn ships most new systems will have 1 to 2 GB of memory. In corporate environments clients have at least 3-year life cycles and home users even in prosperous nations might upgrade less often. In developing nations you'll see system configurations lagging the mainstream by 5 years. That means that most of the world's computers won't have 1-2 GB of memory until several years after Longhorn finally ships.It's amazing to me that no matter how much memory we add, how much faster we make our CPUs, and how much faster we make our disks spin and seek, computing doesn't seem to get faster. If you have a Windows NT 4 system around compare its boot time and common tasks with that of a Windows XP system. Then compare their system specs. Then ask yourself what you really can do on the Windows XP system that you can't do on the Windows NT 4 system.
I'm not sure why this trend is currently bothering Mark so much, because it's been going on for decades. The subset of tasks that must be done in (insert favorite low-level language here) for acceptable performance gets smaller and smaller every day as hardware improves over time. This is a perfectly reasonable tradeoff to make; computers get faster every day, but our brains don't. The goal of the .NET runtime is not to squeeze every drop of performance out of the platform-- it's to make software development easier. A talented developer could write several managed .NET apps in the same time it would take to write one unmanaged C++ app. Would you rather have a single fast native app, or a dozen slower managed apps to choose from?
Mark's article did get me thinking about the inherent overhead of .NET. What is the real minimum footprint of a .NET application?
First, I started a new Console C# project in VS.NET, then added a single Console.WriteLine and a Console.ReadLine. I compiled in release mode, closed the IDE, and double-clicked on the release executable. I then used Mark's Process Exporer to view the process properties:
| .NET 1.1 | .NET 2.0 b2 | .NET 2.0 final | |
| Private Bytes | 3,912 K | 6,984 K | 7,076 K |
| Working Set | 5,800 K | 3,792 K | 3,872 K |
| Page Faults | 1,484 | 963 | 989 |
| Handles | 67 | 65 | 67 |
| GDI Handles | 11 | 5 | 5 |
| USER Handles | 2 | 0 | 0 |
Next, I started a new Windows Forms C# project in VS.NET, then added a single close Button and a label. I compiled in release mode, closed the IDE, and double-clicked on the release executable. I again used process explorer to view the process properties:
| .NET 1.1 | .NET 2.0 b2 | .NET 2.0 final | |
| Private Bytes | 5,760 K | 11,432 K | 11,684 K |
| Working Set | 7,876 K | 7,280 K | 7,072 K |
| Page Faults | 2,140 | 1,876 | 1,817 |
| Handles | 72 | 84 | 76 |
| GDI Handles | 34 | 23 | 25 |
| USER Handles | 15 | 18 | 12 |
(Updated with .NET 2.0 final numbers on 1-12-06. I generated these numbers in a clean Windows XP VM, so they should be accurate.)
The .NET 2.0 beta results were generated on a different machine via Remote Desktop, but I don't think that should affect memory and handles. Maybe it's my VB background talking, but these baseline footprints seem totally reasonable to me, particularly considering the incredible productivity I get in exchange.
April 20, 2005
Because Information is Beautiful
The Edward Tufte books are well known classics now, but I distinctly remember my first encounter with The Visual Display of Quantitative Information in 1995. At the time I was working for a market research company in Denver. I noticed the book sitting on the president's desk while I was in his office doing some typical small business IT stuff. I had never heard of it, and I was intrigued. I started casually paging through it-- and I was absolutely enthralled. I couldn't put it down. After Karl arrived, I told him how amazing the book was; I had to have my own copy. He expressed some surprise that he had no luck getting his market analysts to look at the book, but his crazy IT guy just happened to see it in passing and treated it like some new kind of religion.
Although computers had captured my imagination since childhood, I never considered that part of this attraction had nothing to do with the computer, but the data inside of it. Data that was often displayed in very mundane ways. I had no idea that information could be so beautiful.
It's a powerful concept. One of the more compelling examples is this illustration from Tufte's second book, Envisoning Information, on page 63, where he reduces a mundane illustration from a government manual to its most essential elements:
This example is powerful precisely because it is so very mundane-- the illustration barely registers. With a few simple changes, a generic illustration is transformed into a strikingly powerful visual explanation. It's easy to see where these critically important visual cues could be applied to every kind of human-computer interaction. And that's why the three Tufte books are essential for anyone with any interest in visual design:
Tufte's books have a strictly practical bent, but they do carry an implied question that he never fully addresses: when does the display of information stop being utilitarian and start being art? Can it be both? Should it be both? There are quite a few web sites mapping this strange territory somewhere between utility and beauty.Laurens Lapre's site is dedicated to procedurally generated art; I found his gradient spaces particularly compelling.
Ben Fry's site definitely has a little of both: the amazing, well-known visual zip decode page, and a dump of raw data from a classic Nintendo Entertainment System cartridge he calls Mario Soup:
In another art project, Super Mario Clouds, the same cartridge is reprogrammed to display nothing but an idyllic display of floating 8-bit clouds.
April 19, 2005
What Would Blanka Do?
Sometimes you just gotta ask yourself: What Would Blanka Do?
"Eventually, my nickname at school became Blanka. When I got into real fights, I even tried using some of his moves. They never worked," said Gutierrez. "I often ask myself, what would Blanka do? I even met my wife because of this game. So yes, I owe my whole life to Blanka."
I'm thinking that electricity move might be particularly difficult to pull off.
