I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

April 25, 2005

Canonicalization: Not Just for Popes

You may remember the ASP.NET canonicalization vulnerability from last year. And what exactly is canonicalization? From Microsoft's Design Guidelines for Secure Web Applications:

Data in canonical form is in its most standard or simplest form. Canonicalization is the process of converting data to its canonical form. File paths and URLs are particularly prone to canonicalization issues and many well-known exploits are a direct result of canonicalization bugs. For example, consider the following string that contains a file and path in its canonical form.

c:\temp\somefile.dat

The following strings could also represent the same file.

somefile.dat
c:\temp\subdir\..\somefile.dat
c:\  temp\   somefile.dat
..\somefile.dat
c%3A%5Ctemp%5Csubdir%5C%2E%2E%5Csomefile.dat

In the last example, characters have been specified in hexadecimal form:

  • %3A is the colon character.
  • %5C is the backslash character.
  • %2E is the dot character.

You should generally try to avoid designing applications that accept input file names from the user to avoid canonicalization issues. Consider alternative designs instead. For example, let the application determine the file name for the user. If you do need to accept input file names, make sure they are strictly formed before making security decisions such as granting or denying access to the specified file.

Seems straightforward enough; there can be only one true representation of the data, just like there's only one Pope. And popes don't canonicalize: they canonize. Which means the words "canonicalize" and "canonicalization" are artificially fabricated technical mumbo-jumbo. As if we didn't have enough of that to go around already:

We are asking for your help in eradicating words that have been invented for no good reason. Sometimes, it's too late to do anything about them. Look at the word "canonicalize," for instance. It is used to mean "to create the canonical form" of something, like a URL (as in InternetCanonicalizeUrl from the WinINet API). It's not English; it was invented because someone didn't know that there was already a perfectly adequate word for this process: "canonize." However, once this non-word has been created, the rules of the language suddenly apply again, so the process of "canonicalizing" something is "canonicalization" instead of "canonization."

More recently, we've seen the word "performant" start its crawl into the everyday vocabulary of devspace. It is used to mean "highly performing." It's also not a word. When something provides information, it's informative. It's not "informant." The word "performant," if it existed, would be a noun—not an adjective. But it doesn't exist, so if you do see it in print, remember that it's not really there.

Any readers who have made it this far are probably rolling their eyes now, thinking to themselves, "Why are they being such sticklers here? Isn't the language a wonderful, evolving thing?" Yes, our language is evolving. As there is a need for new words, new words enter the language. But making up new words is just as bad as using fancy words in place of short ones. Why say "This project's goals are orthogonal to the company's needs"? Admit it—if you were at home, you'd just say "different from" or "at odds with."

It's one thing to use technical jargon excessively, but the perpetuation of jargon for jargon's sake is particularly Orwellian. Along those same lines, you may also be interested in Cyrus' list of commitments. Is it clear? As an unmuddied lake, sir. As clear as an azure sky of deepest summer.

Posted by Jeff Atwood    View blog reactions

 

« The Start Menu must be stopped Give me parameterized SQL, or give me death »

 

Comments

Bah. There are a number of reasons that his objection is without merit. For big huge starters, there is no instance in the history of the spoken word of anyone talking the population at large out of using some construction or word. You can kind of convince some subset of people who are insecure or status-conscious about their language to change their ways, but you'll never get the majority to even pay attention and understand your issue, let alone do something about it.

Second, yeah, the language does evolve, yeah, maybe (or maybe not) there is an existing word that does the job. Apparently enough people didn't know that, tho, so a new word has arisen. The history of English is riddled with words that were bent to new uses, or new words that substituted for perfectly ok existing words. That's the way it works. We could all wear jeans and t-shirts, as they're perfectly servicable, but nope, every few years a new fashion sweeps over us.

Anyway canonize != canonicalize.

"Performant" is not a good term, but not because it's a neologism using -ant. It's because the word isn't particularly precise; in fact, it's anti-precise. If a program is performant, is it ... fast? memory-efficient? If it's those things, use those terms, coz they say something specific, eh?

The final issue with jargon is that it is in fact a marker of inclusion and exclusion within a group. A person might think this is a bad thing, but it's a sociolinguistic fact of life. If you're a snowboarder, you use some sort of mutant snowboarder lingo. The point is not, in fact, to communicate with your grandmother about the wonderful experiences you have whilst snowboarding. No, it's to communicate with your snowboarding homies _and to show that you are part of that subculture_.

Granted, the moron who doesn't know when to turn off the jargon is clueless, but so is the guy who wears jeans to the opera, sez me. And some article in MSDN ain't gonna cure him.

Bet you saw this one coming, huh? :-)

As you know, I know whereof I speak -- I spend every working day rasslin' with these sorts of issues. It's an interesting balancing act. We nix "performant" pretty much all the time. But "instantiate" used to make editor types blanch, but no more; ditto "to persist to disk" and "to migrate an application," and heck, "data" as a singular noun. At some point jargon becomes well-enough understood that it's not jargon per se, simply the vocabulary of a specific profession. But there's firm line, and one man's shorthand terms are another man's jargon.


mike on April 26, 2005 01:51 AM

Go read the Jargon file. Learn the tribal language to identify yourself as a member of the tribe. Jargon definitely functions as a filter mechanism (a shibboleth, if you will). But it also gives us precise terms for things which other groups don't care about. Catholics rarely need to talk about a "canonical" form, while developers do.

Language is another thing you can have fun with. You say "canonical" when talking with another dev, not when writing user docs. And applying production rules to generate new forms is part of the fun. "Canonicity"? "Canonicalant" to go with "performant"??? If you're gonna be silly with language why not take it all the way?

"Performant", on the other hand, is something dreamed up by someone who wears a tie and uses Powerpoint far too much...

Christian Mogensen on April 26, 2005 03:16 AM

I can be pretty particular (hopefully not pedantic (though obviously not enough to use "hopefully" not as an adverb)) about language, but I think you know pretension when you see it and "canonicalize" doesn't seem that bad to me. Maybe that's because canonical is already (pre-software) a word that means something, e.g., "Conforming to orthodox or well-established rules or patterns, as of procedure."

"Canonized," as in saints, already means something else. So the verb "canonize" would mean to make into a saint, and that seems a little strong for just converting a path to simplest form. So "canonicalize" seems to me a reasonable choice.

Josh on April 26, 2005 10:21 AM

> that seems a little strong for just converting a path to simplest form

Or we could just ditch the whole ten-dollar word thing and say "convert to simplest form"..

Jeff Atwood on April 26, 2005 10:37 AM

>Or we could just ditch the whole ten-dollar word
>thing and say "convert to simplest form"..

Well, that's just the point -- people don't. Instead of saying "instantiate" you could say "create an instance of" ... but people don't. Instead of saying "persist" you could say "store" ... but people don't. Instead of saying "rehydrate," you could say "restore the non-default values of" ... but people don't.

It's a losing (lost) battle, my friend, and getting one's innerwear into a torsional state has exactly 0.00% effect on any of it. It's like railing against teen fashions. However, ludicrous, it won't change a thing ...

mike on April 26, 2005 03:07 PM

I like how in two comments I've managed make two stupid mistakes. D'oh.

mike on April 26, 2005 03:10 PM

All jargon should be banished.

We should force mathematicians to write proofs in plain English as well! Do they think they're so special just because they understand all those greek symbols. :)

If you cannot communicate something so a complete unititiated layman can understand it probably isn't anything worth saying.

While we're at it the fact that you can't write a decimal representation of PI sucks too. We should take a page out of the 1897 Indiana House of Representatives and make it legally equal to '3.2'.

http://www.straightdope.com/classics/a3_341.html

Steve on April 26, 2005 05:32 PM

The reason we have jargon is because sometimes it's real handy to develop a concept that takes a whole sentence to explain, make up a word for that concept, and then we all know what it means and we can save lots of words and move on up to the next level of abstraction. If we had to explain all concepts with small words to a layperson, we might be able to do it, but it would just a lot longer. When we spend years learning what a "checksum" or "Hilbert Space" is, and we're all amongst ourselves, it's handy to use the shorthand.

I think the goal is to find the place where jargon changes from useful shorthand to pretentious see-how-smart-i-am word dropping.

Also, if they were going to round pi to two decimals, it's fitting that they rounded IN THE WRONG DIRECTION. Wow.

Josh on April 27, 2005 11:17 AM

> I think the goal is to find the place where jargon changes from useful shorthand to pretentious see-how-smart-i-am word dropping.

Hear, hear. Although I do think a few of the words are hard to defend in any context-- like "performant".

Jeff Atwood on April 27, 2005 01:37 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.