Sorting for Humans : Natural Sort Order

December 12, 2007

The default sort functions in almost every programming language are poorly suited for human consumption. What do I mean by that? Well, consider the difference between sorting filenames in Windows explorer, and sorting those very same filenames via Array.Sort() code:

Explorer shell sort Array.Sort()
alphabetical sort ASCIIbetical sort

Quite a difference.

I can say without the slightest hint of exaggeration that this exact sorting problem has been a sore point on every single project I've ever worked on. Users will inevitably complain that their items aren't sorting properly, and file bugs on these "errors". Being card-carrying members of the homo logicus club, we programmers produce a weary sigh, and try to keep any obvious eye-rolling in check as we patiently inform our users that this isn't an error. Items are sorting in proper order. Proper ASCII order, that is. As we're walking away, hopefully you won't hear us mutter under our breath what we're actually thinking-- "Stupid users! They don't even understand how sorting works!"

I always felt a pang of regret when rejecting these requests. Honestly, look at those two lists-- what sane person would want ASCII order? It's a completely nonsensical ordering to anyone who doesn't have the ASCII chart committed to memory (and by the way, uppercase A is decimal 65). I never really understood that there was another way to sort, even though natural sort has been right in front of us all along in the form of Mac Finder and Windows Explorer file listings. I had language-induced blinders on. If our built-in sort returns in ASCII order, then that must be correct. It was bequeathed upon us by the Language Gods. Can there be any other way?

Kate Rhodes is a bit up in arms about our collective ignorance of ASCIIbetical vs. Alphabetical. Can't say I blame her. I'm as guilty as anyone. Turns out the users weren't the stupid ones after all -- I was.

Silly me, I just figured that alphabetical sorting was such a common need (judging by the number of people asking how to do it I'm not wrong either) that I wouldn't have to write the damn thing. But I didn't count on the stupid factor. Jesus Christ people. You're programmers. You're almost all college graduates and none of you know what the f**k "Alphabetical" means. You should all be ashamed. If any of you are using your language's default sort algorithm, which is almost guaranteed to be ASCIIbetical (for good reason) to get alphabetical sorting you proceed to the nearest mirror and slap yourself repeatedly before returning to your desks and fixing your unit tests that didn't catch this problem.

It isn't called "Alphabetical sort"; it's collectively known as natural sort. But she's right about one thing: it's hard to find information on natural sorting, and many programmers are completely ignorant of it. None of the common computer languages (that I know of) implement anything other than ASCIIbetical sorts. There are a few places you can find natural sort algorithms, however:

Don't let Ned's clever Python ten-liner fool you. Implementing a natural sort is more complex than it seems, and not just for the gnarly i18n issues I've hinted at, above. But the Python implementations are impressively succinct. One of Ned's commenters posted this version, which is even shorter:

import re 

def sort_nicely( l ): 
  """ Sort the given list in the way that humans expect. 
  """ 
  convert = lambda text: int(text) if text.isdigit() else text 
  alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
  l.sort( key=alphanum_key ) 

I tried to come up with a clever, similarly succinct C# 3.0 natural sort implementation, but I failed. I'm not interested in a one-liner contest, necessarily, but it does seem to me that a basic natural sort shouldn't require the 40+ lines of code it takes in most languages.

As programmers, we'd do well to keep Kate's lesson in mind: ASCIIbetical does not equal alphabetical. ASCII sorting serves the needs of the computer and the compiler, but what about us human beings? Perhaps a more human-friendly natural sort option should be built into mainstream programming languages, too.

Posted by Jeff Atwood
205 Comments

How should these filenames be sorted?

example1.txt
example2.txt
example3.txt
example4.txt
example5.txt
example-1.txt
example-2.txt
example-3.txt
example-4.txt
example-5.txt

If the numbers are being evaluated correctly, shouldn't they be listed in this order:


example-5.txt
example-4.txt
example-3.txt
example-2.txt
example-1.txt
example1.txt
example2.txt
example3.txt
example4.txt
example5.txt

Hmm.

Simon Wright on February 6, 2010 10:14 PM

Any thoughts on implementing this in SQL? It seems like for web applications this is where this logic needs to exist. (Typically a SQL query both sorts and paginates the data before returning it to the app.)

Landon on February 6, 2010 10:14 PM

Just to nitpick, instead of try/catching, I'd use the TryParse alternative:

  int i;
return int.TryParse(str, out i) ? (object)i : str;

Dries Van Hansewijck on August 6, 2010 8:12 AM

No one seems to recognise that z10.txt is NOT 'zed' 'one' 'zero' it is in fact 'zed' 'ten', which is why humans prefer that sort of sort (lol pun unintentional).

Perhaps to be even more clear, .txt is a method of delineation, z10 would really be z.10 if represented to how humans read it, rather than how code displays it.

Joshua D'Alton on September 5, 2010 12:34 AM

PHP has built-in support for natural sorting (natsort) and Unicode sorting with the intl extension and "root" locale (ICU Collation Algorithm) (http://www.php.net/manual/en/collator.sort.php / http://www.php.net/manual/en/collator.asort.php), I've also managed to implement a hack that mimics the Unicode and natural sorting (see https://github.com/alixaxel/phunction/blob/bdeb2541b7b3599a487a6defd2a197e2e813b1e9/_.php#L470 to check the code out).

Alix Axel on May 2, 2011 7:11 AM

«Back

The comments to this entry are closed.