I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


11 posts from May 2009

May 28, 2009

Server Fault: Calling All Lusers

It's pop quiz time! Put away your notes, and let's begin.

a) Do you own this book?*

unix-system-administration-handbook.png

b) Do you know who this man is?

mark-russinovich-sysinternals.jpg

c) Does this FAQ look familiar to you?

3) OUR LITTLE FRIEND, THE COMPUTER
3.1) Are there any OSes that don't suck?
3.2) Are there any vendors that don't suck?
3.3) How about any hardware?
3.4) Just HOW MUCH does this system suck?
3.5) Where can I find clueful tech support?
3.6) What can I do to help my computers behave?

d) Does the acronym BOFH mean anything to you?

e) Do you think this is funny?

april-fools-day-rfcs.png

If you answered "yes" to any of the above, I am sorry to inform you that you may be a system administrator or IT professional. But I do have one bit of potentially, at least theoretically good news for you:

Server Fault is now in public beta!

serverfault-logo.png

Server Fault is a sister site to Stack Overflow, which we launched back in September 2008. It uses the same engine, but it's not just for programmers any more:

Server Fault is for system administrators and IT professionals, people who manage or maintain computers in a professional capacity. If you are in charge of ...
  • servers
  • networks
  • many desktop PCs (other than your own)
... then you're in the right place to ask your question! Well, as long as the question is about your servers, your networks, or desktops you support, anyway.

Please note that Server Fault is not for general computer troubleshooting questions; if you paid for that desktop hardware, and it's your personal workstation, it is unlikely that your question is appropriate for Server Fault.

I occasionally dabble in system administration and IT professional stuff; my last blog entry was about RAID, for example. As a programmer who loves hardware as much as software, I've wanted this site for months, and I'm thrilled to see it go live, as I explained on a recent RunAs radio podcast.

Although there is certainly some crossover, we believe that the programming community and the IT/sysadmin community are different beasts. Just because you're a hotshot programmer doesn't mean you have mastered networking and server configuration. And I've met a few sysadmins who could script circles around my code. That's why Server Fault gets its own domain, user profiles, and reputation system.

userfriendly-evolution-of-language.png

So if you're a bona-fide BOFH, or just a wanna-be BOFH luser like me, join us on Server Fault. Who knows, maybe we lusers can learn something from each other.

* (For the record, yes, I do own that book -- although I am easily the world's worst UNIX system administrator.)

Posted by Jeff Atwood    60 Comments

May 26, 2009

Beyond RAID

I've always been leery of RAID on the desktop. But on the server, RAID is a definite must:

"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design goals: increased data reliability or increased input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across multiple disks, but the array is seen by the computer user and operating system as one single disk.

I hadn't worked much at all with RAID, as I felt the benefits did not outweigh the risks on the desktop machines I usually build. But the rules are different in the datacenter; the servers I built for Stack Overflow all use various forms of RAID, from RAID 1 to RAID 6 to RAID 10. While working with these servers, I was surprised to discover there are now umpteen zillion numbered variants of RAID -- but they all appear to be based on a few basic, standard forms:

RAID 0: Striping

Data is striped across (n) drives, which improves performance almost linearly with the number of drives, but at a steep cost in fault tolerance; a failure of any single striped drive renders the entire array unreadable.

raid-0-diagram.png

RAID 1: Mirroring

Data is written across (n) drives, which offers near-perfect redundancy at a slight performance decrease when writing -- and at the cost of half your overall storage. As long as one drive in the mirror array survives, no data is lost.

raid-1-diagram.png

Raid 5: Parity

Data is written across (n) drives with a parity block. The array can tolerate one drive failure, at the cost of one drive in storage. There may be a serious performance penalty when writing (as parity and blocks are calculated), and when the array is rebuilding.

raid-5-diagram.png

Raid 6: Dual Parity

Data is written across (n) drives with two parity blocks. The array can tolerate two drive failures, at the cost of two drives in storage. There may be a serious performance penalty when writing (as parity and blocks are calculated), and when the array is rebuilding.

raid-6-diagram.png

(yes, there are other forms of RAID, but they are rarely implemented or used as far as I can tell.)

It's also possible to generate so-called RAID 10 or RAID 50 arrays by nesting these RAID levels together. If you take four hard drives, stripe the two pairs, then mirror the two striped arrays -- why, you just created yourself a magical RAID 10 concoction! What's particularly magical about RAID 10 is that it inherits the strengths of both of its parents: mirroring provides excellent redundancy, and striping provides excellent speed. Some would say that RAID 10 is so good it completely obviates any need for RAID 5, and I for one agree with them.

This was all fascinating new territory to me; I knew about RAID in theory but had never spent hands-on time with it. The above is sufficient as a primer, but I recommend reading through the wikipedia entry on RAID for more depth.

It's worth mentioning here that RAID is in no way a substitute for a sane backup regimen, but rather a way to offer improved uptime and survivability for your existing systems. Hard drives are cheap and getting cheaper every day -- why not use a whole slew of the things to get better performance and better reliability for your servers? That's always been the point of Redundant Array of Inexpensive Disks, as far as I'm concerned. I guess Sun agrees; check out this monster:

sun-x4500-top.jpg

That's right, 48 commodity SATA drives in a massive array, courtesy of the Sun Sunfire X4500. It also uses a new RAID system dubbed RAID-Z:

RAID-Z is a data/parity scheme like RAID-5, but it uses dynamic stripe width. Every block is its own RAID-Z stripe, regardless of blocksize. This means that every RAID-Z write is a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, completely eliminates the RAID write hole. RAID-Z is also faster than traditional RAID because it never has to do read-modify-write.

But far more important, going through the metadata means that ZFS can validate every block against its 256-bit checksum as it goes. Traditional RAID products can't do this; they simply XOR the data together blindly.

Which brings us to the coolest thing about RAID-Z: self-healing data. In addition to handling whole-disk failure, RAID-Z can also detect and correct silent data corruption. Whenever you read a RAID-Z block, ZFS compares it against its checksum. If the data disks didn't return the right answer, ZFS reads the parity and then does combinatorial reconstruction to figure out which disk returned bad data. It then repairs the damaged disk and returns good data to the application. ZFS also reports the incident through Solaris FMA so that the system administrator knows that one of the disks is silently failing.

Finally, note that RAID-Z doesn't require any special hardware. It doesn't need NVRAM for correctness, and it doesn't need write buffering for good performance. With RAID-Z, ZFS makes good on the original RAID promise: it provides fast, reliable storage using cheap, commodity disks.

Pardon the pun, but I'm not sure if it makes traditional hardware RAID redundant, necessarily. Even so, there are certainly fantastic, truly next-generation ideas in ZFS. There's a great ACM interview with the creators of ZFS that drills down into much more detail. Hard drives may be (mostly) dumb hunks of spinning rust, but it's downright amazing what you can do when you get a whole bunch of them working together.

Posted by Jeff Atwood    89 Comments

May 25, 2009

Penny Auctions: They're Gambling

Late last year, I encountered what may be nearly perfect evil in business plan form: Swoopo. What is Swoopo? It's a class of penny auction, where bidders pay for the privilege of bidding:

[Penny auctions] offer new televisions, computers, game consoles, appliances, handbags, gold bars and more for starting prices of a penny to 15 cents, depending on the site.

To "win" a product, shoppers must first buy a bundle of 10 to 700 bids for 60 cents to $1 each. Shoppers use one each time they place a virtual bid on a product. Each bid raises the price of the item by a penny to 15 cents, depending on the site. Some have automatic bidding functions similar to eBay.

Doing the math and not getting carried away is important: The final price of a product that retails for $100 might be $29, but the total price paid could be much more, depending upon the number of bids used. If a shopper bids 10 times at $1 a bid, for instance, the total price paid would jump to $39. And, there is the real possibility of using all your bids without getting the product.

Auction winners generally get their item for about 65 percent off retail but could save as much as 98 percent if there are few bidders.

Since the sites make the bulk of their revenue from the purchase of bids, they profit most when they feature a product that elicits a bidding war.

One of Swoopo's investors recently contacted me via email, and I had to marvel at the size of the cojones you'd need to associate yourself with this kind of nastiness. Swoopo is evil beyond the likes of Saddam Hussein, The Balrog, OSB, Darth Vader, and Barbra Streisand -- combined.

He wanted to talk to me on the phone about positioning, and staunchly maintained that there was no element of chance in a Swoopo "auction". Once I stopped laughing, I told him these were my terms:

If you believe in Swoopo, then data speaks much louder than words.

Let's conduct an experiment.

Doesn't have to be you, personally. Take n dollars, and use those n dollars in whatever strategy it takes to win items (of MSRP $399 or higher) on Swoopo.

If Swoopo isn't a game of chance or lottery, a skilled player should be able to win at least one item in this experiment, yes?

I'd be happy to run this experiment and write about it on my blog. Just let me know what terms you think make sense.

I haven't heard from him since. (Now I'm curious if anyone is willing to take on this experiment, under the same terms.)

Because Swoopo is, at its heart, thinly veiled gambling. The companies backing Swoopo and other Penny Auction sites are hoping unsophisticated regulatory agencies will buy the "It's not a game of chance" argument if it's wrapped in a lot of technical intarweb mumbo-jumbo they can't fully comprehend.

But we're no government flacks. We're programmers, and many of us develop websites for a living. It's a bit tougher to pull the wool over our eyes. In Trying to Game Swoopo, Joshua Stein pulled out everything in his programmer's bag of tricks to win a Swoopo auction -- and, predictably, failed.

With all of this data available, I concluded that there is no way to reliably win an auction on swoopo.com without using their bidbutler service. There are delays on their network/servers in processing manual bids, whether intentional or just due to bad design, that cause manual bids placed with 1 or 2 seconds remaining not to be cast. users of their bidbutler service have an unfair advantage in that their bids are placed on the server side and are not subject to these delays.

Since it is not possible to reliably place manual bids, the only way to guarantee that an auction can be won (while still coming out ahead) is to use the site's bidbutler service with high ceilings on the number of bids and amount that one will let it bid up to. Those ceilings have to take into account the item's current price, and will be lower the longer an item is being bid on.

As Joshua's data shows, there is no way to win a Swoopo auction other than through sheer random chance -- that is, your client-side bid happens to wind its way through the umpteen internet routers between the server and your computer in time, ending up at the top of a queue with dozens or hundreds of other bids placed within a fraction of a second of each other. And what's worse, you may not have any chance at all, unless you place a server-side bet through their exploitatively expensive "bidbutler" service.

As I said in my original post, the only winning strategy at Swoopo, or any other penny auction site, is not to play. On Swoopo, there are nothing but millions of losers -- and by that I mean they are gambling and losing millions of real dollars to the house. Which would be OK, I guess, if it was properly regulated as gambling. Swoopo and all these other penny auction sites should be regulated and classified as the online gambling sites in sheep's clothing they really are.

Let's see what we can do to hasten this process along. Warn your friends and family. Complain to the Better Business Bureau and other regulatory agencies. And if you feel as strongly as I do about this, please write your congressmen/women and urge them to regulate these exploitative penny auctions.

Posted by Jeff Atwood    192 Comments

May 22, 2009

How to Motivate Programmers

There's an inherent paradox in motivating programmers. I think this Geek Hero Comic illustrates it perfectly:

geek-hero-panel-1.png

geek-hero-panel-2.png

It's a phenomenon I've noticed even in myself. Nothing motivates like having another programmer tell you they're rewriting your code because it sucks. Dave Thomas has talked about this for years in his classic Developing Expertise presentation, supported by the following quote:

Interestingly enough, a friend of mine (who is a quality control manager in a hospital) often makes identical statements in reference to doctors: Polite requests, coercion, etc. are useless at best and often detrimental. Peer pressure and competition are the key.

Don't try to race sheep,
Don't try to herd race horses

Yes, the use of the term sheep is mildly derogatory, but the general principle is sound: use motivational techniques that are appropriate to the level of developers you're working with. If you have neophyte developers, herd them with maxims, guidelines and static rules. If you have experienced developers, rules are less useful. Instead, encourage them to race: engage in a little friendly competition and show off how good they are to their peers.

Posted by Jeff Atwood    111 Comments

May 19, 2009

The Bathroom Wall of Code

In Why Isn't My Encryption.. Encrypting?, many were up in arms about the flawed function I posted. And rightfully so, as there was a huge mistake in that code that just about invalidates any so-called "encryption" it performs. But there's one small problem: I didn't write that function.

Now, I am certainly responsible for that function, in the sense that it magically appeared in our codebase one day -- and the the entire project is the sum of all the code contributed by every programmer working on it. I invoke the First Rule of Programming: It's Always Your Fault. And by "your", I don't mean the particular programmer who contributed this code, who shall remain blissfully nameless. I mean us -- the entire team. The onus is on us, as a team, to vet every line of code at the time it is contributed, and constantly peer review each other's code. It's a responsibility we shoulder together. Nobody owns the code, because everybody owns the code.

Yes, I failed. Because the team failed.

Geoff Weinhold left this prophetic comment on the post:

The irony in this is that someone will inevitably end up here for sample encryption code and blindly copy/paste your flawed code.

Indeed. Heaven forbid someone copy and paste flawed code from the internet into their project! In fact, a quick search on some of the unique strings in that original Encrypt() function turns up a few ... interesting ... search results.

01/2006C# Shiznit - Library Encrypt and Decrypt Methods using TripleDES and MD5
05/2006Code Project - Encrypt and Decrypt Data with C#
04/2007Bytes - String Encryption Help
06/2008Egghead Cafe - invalid length while decrypting TripleDESCryptoServiceProvider
09/2008ASP.NET Forums - Need help on password-encrypted key used for signing
11/2008code:keep Encryption
12/2008Encrypt/Decrypt the password in C# .net
05/2009My Own Stupid Blog Post

That's just a sampling of the 131 web hits I got. To paraphrase Austin Powers, this Encrypt() function is like the village bicycle: everybody's had a ride. It's a shame this particular bicycle happens to have a crippling lack of brakes that makes it dangerous to ride, but what can you do.

Scott Hanselman coined a nice phrase for this: the internet as the bathroom wall of code.

bathroom wall graffiti

It's true. People, being people, have gone and scrawled a bunch of random code graffiti all over the damn internet. Some of it is vanity tagging. Some of it is borderline vandalism. And some of it is art. How do we tell the difference?

That's the very reason I put forth a modest proposal for the copy and paste school of code reuse. Not that it would have helped in this case, but it sure would be nice if someone could perform a grep replace ...

s/Mode = CipherMode.ECB/Mode = CipherMode.CBC/g

... on, like, the entire internet. So other projects don't absorb this critically flawed code sample.

In the meantime, until that tool is developed, I recommend that you apply extra-strength peer review to any code snippets you absorb into your project from the bathroom wall of code. That internet code snippet you're looking at, the one that appears to be just what you're looking for, could also be random graffiti scrawled on a bathroom wall.

It's true that some bathrooms are nicer than others. But as we've learned, it pays to be especially careful when cribbing code from the internet.

Posted by Jeff Atwood    110 Comments

May 17, 2009

Why Isn't My Encryption.. Encrypting?

It's as true in life as it is in client-server programming: the only secret that can't be compromised is the one you never revealed.

But sometimes, it's unavoidable. If you must send a secret down to the client, you can encrypt it. The most common form of encryption is symmetric encryption, where the same key is used to both encrypt and decrypt. Most languages have relatively easy to use libraries in place for symmetric encryption. Here's how we were doing it in .NET:

public static string Encrypt(string toEncrypt, string key, bool useHashing)
{

    byte[] keyArray = UTF8Encoding.UTF8.GetBytes(key);
    byte[] toEncryptArray = UTF8Encoding.UTF8.GetBytes(toEncrypt);

    if (useHashing)
        keyArray = new MD5CryptoServiceProvider().ComputeHash(keyArray);

    var tdes = new TripleDESCryptoServiceProvider() 
        { Key = keyArray, Mode = CipherMode.ECB, Padding = PaddingMode.PKCS7 };

    ICryptoTransform cTransform = tdes.CreateEncryptor();
    byte[] resultArray = cTransform.TransformFinalBlock(
        toEncryptArray, 0, toEncryptArray.Length);

    return Convert.ToBase64String(resultArray, 0, resultArray.Length);
}

This is how our symmetric encryption function works:

  1. We start with a secret string we want to protect. Let's say it is "password123".
  2. We pick a key. Let's use the key "key-m4st3r"
  3. Before encrypting, we'll prefix our secret with a salt to prevent dictionary attacks. Let's call our salt "NaCl".

We'd call the function like so:

Encrypt("NaCl" + "password123", "key-m4ast3r", true);

The output is a base64 encoded string of the TripleDES encrypted byte data. This encrypted data can now be sent to the client without any reasonable risk that the secret string will be revealed. There's always unreasonable risk, of the silent black government helicopter sort, but for all practical purposes there's no way someone could discover that your password is "password123" unless your key is revealed.

In our case, we were using this Encrypt() method to experiment with storing some state data in web pages related to the login process. We thought it was secure, because the data was encrypted. Sure it's encrypted! It says Encrypt() right there in the method name, right?

Wrong.

There's a bug in that code. A bug that makes our encrypted state data vulnerable. Do you see it? My coding mistakes, let me show you them!

string key = "SuperSecretKey";

Debug.WriteLine(
    Encrypt("try some different" +
    "00000000000000000000000000000000",
    key, true).Base64ToHex());

Debug.WriteLine(
    Encrypt("salts" +
    "00000000000000000000000000000000",
    key, true).Base64ToHex());
        
3908024fc33b55c3
4e885c8946b80735
704cbe2a41d25f21
81bb6d726bd35152
81bb6d726bd35152
81bb6d726bd35152
1367f10f2584ace3

4ae7661295a98e46
81bb6d726bd35152
81bb6d726bd35152
81bb6d726bd35152
4ee5d23b3b5e3eb4    

(I'm using strings with multiples of 8 here to make the Base64 conversions easier.)

Do you see the mistake now? It's a short trip from here to unlimited data tampering, particularly since the state data from the login process contained user entered strings. An attacker could simply submit the form as many times as she likes, chop out the encrypted attack values from the middle, and insert them into the next encrypted request -- which will happily decrypt and be processed as if our code had sent it down!

The culprit is this line of code:

{ Key = keyArray, Mode = CipherMode.ECB, Padding = PaddingMode.PKCS7 }

Which, much to our embarrassment, is an incredibly stupid parameter to use in symmetric encryption:

The Electronic Codebook (ECB) mode encrypts each block individually. This means that any blocks of plain text that are identical and are in the same message, or in a different message encrypted with the same key, will be transformed into identical cipher text blocks. If the plain text to be encrypted contains substantial repetition, it is feasible for the cipher text to be broken one block at a time. Also, it is possible for an active adversary to substitute and exchange individual blocks without detection.

It's fairly standard for symmetric encryption algorithms to use feedback from the previous block to seed the next block. I honestly did not realize that it was possible to pick a cipher mode that did not do some kind of block chaining! CipherMode.ECB? More like CipherMode.Fail!

So, what have we learned?

  1. If it doesn't have to be sent to the client, then don't! Secrets sent to the client can potentially be tampered with and compromised in various ways that aren't easy to see or even predict. In our case, we can store login state on the server and avoid transmitting any of that state to the client altogether.

  2. It isn't encryption until you've taken the time to fully understand the concepts behind the encryption code. Specifically, we didn't notice that our encryption function was using a highly questionable CipherMode that allowed block level substitution of the encrypted data.

Luckily, this was a somewhat experimental page on the site, so we were able to revert back to our standard server-side approach rather quickly once the exploit was discovered. I'm no Bruce Schneier, but I have a reasonable grasp of encryption concepts. And I still completely missed this problem.

So the next time you sit down to write some encryption code, consider the above two points carefully. Otherwise, like us, you'll be left wondering why your encryption isn't... encrypting.

(Thanks to Daniel LeCheminant for his assistance in discovering this issue.)

Posted by Jeff Atwood    101 Comments

May 13, 2009

Why Do Computers Suck at Math?

You've probably seen this old chestnut by now.

google-calculator-incorrect.png

Insert your own joke here. Google can't be wrong -- math is! But Google is hardly alone; this is just another example in a long and storied history of obscure little computer math errors that go way back, such as this bug report from Windows 3.0.

  1. Start Calculator.
  2. Input the largest number to subtract first (for example, 12.52).
  3. Press the MINUS SIGN (-) key on the numeric keypad.
  4. Input the smaller number that is one unit lower in the decimal portion (for example, 12.51).
  5. Press the EQUAL SIGN (=) key on the numeric keypad.

On my virtual machine, 12.52 - 12.51 on Ye Olde Windows Calculator indeed results in 0.00.

Windows 3.11 calculator incorrect

And then there was the famous Excel bug.

If you have Excel 2007 installed, try this: Multiply 850 by 77.1 in Excel.

One way to do this is to type "=850*77.1" (without the quotes) into a cell. The correct answer is 65,535. However, Excel 2007 displays a result of 100,000.

At this point, you might be a little perplexed, as computers are supposed to be pretty good at this math stuff. What gives? How is it possible to produce such blatantly incorrect results from seemingly trivial calculations? Should we even be trusting our computers to do math at all?

Well, numbers are harder to represent on computers than you might think:

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton's Law of Gravity is only known to four significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton's gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations -- addition, multiplication, division -- are very accurate. As long as you don't overflow or underflow, these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate. If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations.

Number precision is a funny thing; did you know that an infinitely repeating sequence of 0.999.. is equal to one?

In mathematics, the repeating decimal 0.999Ö denotes a real number equal to one. In other words: the notations 0.999Ö and 1 actually represent the same real number.

0.999 infinitely repeating

This equality has long been accepted by professional mathematicians and taught in textbooks. Proofs have been formulated with varying degrees of mathematical rigour, taking into account preferred development of the real numbers, background assumptions, historical context, and target audience.

Computers are awesome, yes, but they aren't infinite.. yet. So any prospects of storing any infinitely repeating number on them are dim at best. The best we can do is work with approximations at varying levels of precision that are "good enough", where "good enough" depends on what you're doing, and how you're doing it. And it's complicated to get right.

Which brings me to What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation.

What do the Google, Windows, and Excel (pdf) math errors have in common? They're all related to number precision approximation issues. Google doesn't think it's important enough to fix. They're probably right. But some mathematical rounding errors can be a bit more serious.

Interestingly, the launch failure of the Ariane 5 rocket, which exploded 37 seconds after liftoff on June 4, 1996, occurred because of a software error that resulted from converting a 64-bit floating point number to a 16-bit integer. The value of the floating point number happened to be larger than could be represented by a 16-bit integer. The overflow wasn't handled properly, and in response, the computer cleared its memory. The memory dump was interpreted by the rocket as instructions to its rocket nozzles, and an explosion resulted.

I'm starting to believe that it's not the computers that suck at math, but the people programming those computers. I know I'm living proof of that.

Posted by Jeff Atwood    251 Comments

May 11, 2009

The Web Browser Address Bar is the New Command Line

Google's Chrome browser passes anything you type into the address bar that isn't an obvious URI on to the default search engine.

chrome address bar onebox

While web browsers should have some built-in smarts, they can never match the collective intelligence of a worldwide search engine. For example:

weather San Francisco
CSCO
time London
san francisco 49ers
5*9+(sqrt 10)^3=
Henry+Wadsworth+Longfellow
earthquake
10.5 cm in inches
population FL
Italian food 02138
movies 94705
homes Los Angeles
150 GBP in USD
Seattle map
Patent 5123123
650
american airlines 18
036000250015
JH4NA1157MT001832
510-525-xxxx (I'm hesitant to link a listed personal phone number here, but it does work)

I like to think of the web browser address bar as the new command line.

Oh, you wanted dozens of cryptic, obscure UNIX style command line operators and parameters? No problem!

define:defenestrate
presidents 1850...1860
"plants vs. zombies" daterange:2454955-2454955
link:experts-exchange.com sucks
filetype:pdf programming language poster
allintitle:nigerian site:www.snopes.com

Any command line worth its salt has some kind of scripting language built in, too, right? No sweat. Just try entering this in your browser's address bar.

javascript:alert('Hello, world!')

The sky's the limit from there; whatever JavaScript you can fit in the address bar is fair game. These are more commonly known as "bookmarklets".

Apparently we've spent the last 20 years reimplementing the UNIX command line in the browser. Services like yubnub make this process even more social, with collaborative group creation (and ranking!) of new commands. You can find some of the cooler ones on the golden eggs page.

gimf "carrot top"
esv Ezekiel 25:17
2g color colour

Honestly, I was never a big command-line enthusiast; even way back when on my Amiga I'd choose the GUI over the CLI whenever I could. But maybe I bet on the wrong horse. Perhaps the command prompt -- or more specifically, the search oriented, crowdsourced, world public command prompt -- really is the future.

Posted by Jeff Atwood    154 Comments

May 8, 2009

Pseudocode or Code?

Although I'm a huge fan of Code Complete -- it is my single most recommended programming book for good reason -- there are chapters in it that I haven't been able to digest, even after 16 years.

One of those chapters describes something called the Pseudocode Programming Process. And on paper, at least, it sounds quite sensible. Before writing a routine, you describe what that routine should do in plain English. So if we we set out to write an error handling lookup routine, we'd first write it in pseudocode:

set the default status to "fail"
look up the message based on the error code
if the error code is valid
    if doing interactive processing, display the error message
        interactively and declare success
    if doing command line processing, log the error message to the
        command line and declare success
if the error code isn't valid, notify the user that an
    internal error has been detected
return status information

Then, when you're satisfied that you understand what the routine should do, you turn that pseudocode into comments that describe the code you're about to write.

// set the default status to "fail"
Status errorMessageStatus = Status_Failure;

// look up the message based on the error code
Message errorMessage = LookupErrorMessage( errorToReport );

// if the error code is valid
if ( errorMessage.ValidCode() ) {

    // determine the processing method
    ProcessingMethod errorProcessingMethod = CurrentProcessingMethod();

    // if doing interactive processing, display the error message
    // interactively and declare success
    if ( errorProcessingMethod == ProcessingMethod_Interactive ) {
        DisplayInteractiveMessage( errorMessage.Text() );
        errorMessageStatus = Status_Success;
    }

Pseudocode is sort of like the Tang of programming languages -- you hydrate the code around it.

tang.jpg

But why pseudocode? Steve offers some rationales:

  • Pseudocode makes reviews easier. You can review detailed designs without examining source code. Pseudocode makes low-level design reviews easier and reduces the need to review the code itself.
  • Pseudocode supports the idea of iterative refinement. You start with a high-level design, refine the design to pseudocode, and then refine the pseudocode to source code. This successive refinement in small steps allows you to check your design as you drive it to lower levels of detail. The result is that you catch highlevel errors at the highest level, mid-level errors at the middle level, and low-level errors at the lowest level -- before any of them becomes a problem or contaminates work at more detailed levels.
  • Pseudocode makes changes easier. A few lines of pseudocode are easier to change than a page of code. Would you rather change a line on a blueprint or rip out a wall and nail in the two-by-fours somewhere else? The effects aren't as physically dramatic in software, but the principle of changing the product when it's most malleable is the same. One of the keys to the success of a project is to catch errors at the "least-value stage," the stage at which the least effort has been invested. Much less has been invested at the pseudocode stage than after full coding, testing, and debugging, so it makes economic sense to catch the errors early.
  • Pseudocode minimizes commenting effort. In the typical coding scenario, you write the code and add comments afterward. In the PPP, the pseudocode statements become the comments, so it actually takes more work to remove the comments than to leave them in.
  • Pseudocode is easier to maintain than other forms of design documentation. With other approaches, design is separated from the code, and when one changes, the two fall out of agreement. With the PPP, the pseudocode statements become comments in the code. As long as the inline comments are maintained, the pseudocode's documentation of the design will be accurate.

All compelling arguments. As an acolyte of McConnell, it pains me to admit this, but every time I've tried the Pseudocode Programming Process, I almost immediately abandon it as impractical.

Why? Two reasons:

  1. code > pseudocode. I find it easier to think about code in code. While I'm all for describing the overall general purpose of the routine before you write it in plain English -- this helps name it, which is incredibly difficult -- extending that inside the routine doesn't work well for me. There's something fundamentally.. unrealistic.. about attempting to using precise English to describe the nuts and bolts of code.
  2. Starting with the goal of adding comments to your code seems backwards. I prefer coding without comments, in that I want the code to be as self-explanatory as humanly possible. Don't get me wrong; comments do occur regularly in my code, but only because the code could not be made any clearer without them. Comments should be a method of last resort, not something you start with.

Of course, PPP is just one proposed way to code, not the perfect or ideal way. McConnell has no illusions about this, and acknowledges that refactoring, TDD, design by contract, and even plain old "hacking" are valid and alternative ways to construct code.

But still -- I have a hard time seeing pseudocode as useful in anything other than possibly job interviews. And even then, I'd prefer to sit down in front of a computer and write real code to solve whatever problem is being posed. What's your take? Is pseudocode a useful tool in your programming? Do you write pseudocode before writing code?

Posted by Jeff Atwood    290 Comments

May 5, 2009

I Just Logged In As You: How It Happened

In my previous post I Just Logged In As You, I disclosed that someone was logging in as me -- specifically because they discovered my password. But how?

If I wanted to discover someone's password, I can think of a few ways:

  1. Educated guess. If you know someone's birthday, their pets, their children's names, favorite movies, and so on -- these are all potential passwords in various forms. This is classic social engineering, and it can work; that's essentially how Sarah Palin's email was hacked. While my password was weak, it wasn't anything you could reasonably guess based on public information available about me.

  2. Brute force dictionary attack. If login attempts aren't meaningfully rate limited, then you can attempt a dictionary attack and pray the target password is a simple dictionary word. That's how one Twitter administrator's account was compromised. But failing to rate limit password attempts is strictly amateur hour stuff (and I'd argue borderline incompetence); no OpenID provider of any consequence would make this mistake.

  3. Interception. Eavesdrop on the user in any way you can to discover their password: install a hardware keylogger, software keylogger, or perform network sniffing of unencrypted traffic. If you have physical access to the user, low-tech analog methods such as watching over someone's shoulder as they type in their password are effectively the same thing. While I can't rule out paranoid fantasies of keyloggers, if my machine was so thoroughly 0wnz0red, I think my OpenID password would have been the least of my worries at that point.

  4. Impersonation. Commonly known as phishing. You present the user with a plausible looking login page for a service they already use, and hope they enter their credentials. Alternately, in the depressingly common Web 2.0 style, you can just demand that users give up their credentials for some trivial integration feature with the target website. I consider both forms of phishing, and I call it the forever hack for good reason.

So which of these methods did this person use to obtain my password? None of them.

It wasn't a guess and it wasn't brute force.

I guess I can tell you, so you don't fall into this trap again. There's a site I help out with that doesn't salt their passwords. They're MD5 encrypted, but if you've got a dictionary password, it's very easy to use a reverse-MD5 site to get the original. I was able to figure out you were a user on the site some time back, and realized I could do this, if only I knew your openid provider...

(As an aside, I complained to the head of the site months ago that he ought to start salting passwords for this exact reason. I also run my passwords I need to be secure through a few reverse-hash websites, just to ensure that it's not stored somewhere.)

So, the unethical part was actually looking up this information in the first place. I apologize. But like I said, better than someone else getting into this data.

Hey, it looks like you're storing passwords incorrectly!

we have met the enemy and he is us

We have met the enemy, and he is.. programmers just like us. Seriously, go read that blog entry. It is exactly, exactly what just happened to me.

When I say programmers like us, I mean me, too. I acknowledge that I am also at fault here, for...

  • using the same low-value credential password in two places.
  • picking a particularly weak password.
  • not using a high-value credential for something that clearly deserved it, namely, my moderator login to Stack Overflow.

All of this is true, and I shoulder the blame for that. Perhaps I should take my own advice. A moment of weakness, I suppose.

The important thing to take away from this, if you're a programmer working on an application that stores user credentials, is to get the hell out of the business of storing user credentials! As we've seen today, the world is full of stupid users like me who do incredibly stupid things. Are you equipped and willing do everything necessary to protect idiots like me from myself? That's a key part of the promise of OpenID, and one of the reasons we chose it as the authentication system for Stack Overflow. As one commenter noted on Reddit:

I, for one, think that my OpenID provider is more secure than the average guy running a forum.

Exactly. We outsourced our user credential system to people who are much better at it than us (well, depending on which OpenID provider you pick). And also because we didn't think the world needed yet another username and password. You're welcome. I think.

So, what have we learned?

  1. Programmers are the enemy.
  2. Hey .. wait a second, I'm a programmer!
  3. GOTO 1

(Oh, and credit to Malte, the first commenter to correctly identify what the likely password vulnerability was -- less than an hour after the entry was posted!)

Posted by Jeff Atwood    166 Comments
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.