I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

March 13, 2008

What's Wrong With Turkey?

Software internationalization is difficult under the best of circumstances, but it always amazed me how often one particular country came up in discussions of internationalization problems: Turkey.

turkish flag

For example, this Rick Strahl post from mid-2005 is one of many examples I've encountered:

I've been tracking a really funky bug in my West Wind Web Store application that seems to crop up only very infrequently in my error logs. In a previous post I mentioned that I had instituted some additional logging features – specifically making sure that I would also log the locale of the user accessing the application.

Well, three bug reports later I noticed that all errors occurred with a Turkish (tr) browser. So I changed my browser's default language to Turkish and sure enough I could see the error occur.

Or, say, this 2005 post from Scott Hanselman:

I had blogged earlier about a bug in dasBlog that affected Turkish users. When a Turkish browser reported an HTTP Accept-Language header indicating Turkish as the preferred language, no blog posts would show up. As fix, I suggested that users change their blog templates, but I knew that wasn't an appropriate fix.

I understand that Turkish prisons are not to be trifled with, but the question remains: why do Turkish people take such cruel and perverse delight in breaking our fine software? What's wrong with Turkey?

As with so many other problems in software development, the question shouldn't be what's wrong with Turkey, but rather, what the hell is wrong with software developers? Some of this is sort of obvious if you have any cultural awareness whatsoever.

  • In the United States, we would typically format today's date as 3/14/2008. In Turkey, they format it as 14.3.2008.

  • In the United States, we use commas to group digits, like so: 32,768. In Turkey, they group digits using a period, so the same number would be entered as 32.768.

These minor formatting differences are usually not a big deal for output and display purposes, but it's a whole different ballgame when you're parsing input. You'd naturally expect people to input dates and numbers in the format they're used to. If your code assumes that input will be in typical American English format, there will be.. trouble.

Most languages have this covered; there are functions that allow you to read or write dates and numbers appropriately for various cultures. In .NET, for example, it's the difference between these two calls:

int.Parse("32.768");
int.Parse("32,768", System.Globalization.NumberFormatInfo.InvariantInfo);

Because no culture is specified, the first call will parse the number according to the rules of the default culture that code is running under. Let's hope it's running under a Turkish version of Windows, so it can parse the number correctly. The second call, however, explicitly specifies a culture. The "invariant" culture is every American programmer's secret dream realized: we merely close our eyes and wish away all those confusing languages and cultures and their crazy, bug-inducing date and number formatting schemes in favor of our own. A nice enough dream while it lasts, but instead of rudely asking your users to "speak American" through the invariant culture, you could politely ask them to enter data in ISO international standard format instead.

Anyway, point being, this kind of culture support is baked into most modern programming languages, so all you need to do is make sure your developers are aware of it-- and more importantly, that they're thinking about situations when they might need to use it.

But all that date and time formatting stuff is easy. Or about as easy as i18n ever gets, anyway. Strings are where it really starts to get hairy. Guess where this code fails?

switch (myType.ToLower())
{
   case "integer" : ;
}

If you guessed Turkey, you're wrong! Just kidding. Of course it fails in Turkey. When we convert the string "integer" to upper and lower case in the Turkish locale, we get some strange characters back:

"INTEGER".ToLower() = "ınteger"
"integer".ToUpper() = "İNTEGER"

It's sort of hard to see the subtle differences here unless we ratchet up the font size:

I → lowercase → ı
i → uppercase → İ

There's obviously no way these strings are going to match "integer" or "INTEGER" respectively. This is known as the Turkish I problem, and the solution should feel awfully familiar by now:

"INTEGER".ToLower(System.Globalization.CultureInfo.InvariantCulture)

That will produce the expected output, or at least, the output that matches the comparison in the original code snippet.

This is, of course, only the tip of the iceberg when it comes to internationalization. We haven't even touched on the truly difficult locales like Hebrew and Arabic. But I do agree with Jeff Moser-- if your code can pass the Turkey test, you're doing quite well. Certainly better than most.

Passed 'The Turkey Test'

If you care a whit about localization or internationalization, force your code to run under the Turkish locale as soon as reasonably possible. It's a strong bellwether for your code running in most-- but by no means all-- cultures and locales.

[advertisement] Don't denormalize your data just to write reports! Data Dynamics Reports can use your existing data relationships when creating reports.

Posted by Jeff Atwood    View blog reactions

 

« Choosing Your Own Adventure Does More Than One Monitor Improve Productivity? »

 

Comments

Why simply Turkey? It's not like they're unique.

Most of continental Europe (and East of there) uses a period to separate number groups, and a comma for 'decimal' numbers. Also, very few places in the world write the date mm/dd/yyyy like you Americans do.

To say that Turkey is the odd-one out is to either be grossly misinformed, or largely ignorant of the rest of Europe. Not picking on you, as such, but rather questioning why this is about Turkey, not just about general internationalisation.

James on March 14, 2008 03:37 AM

It's convenient because:

1) Turkey is similar enough to other Latin alphabets that it's not a giant engineering nightmare to get it to work (see: Arabic or Hebrew).

2) The Turkish-I problem ( http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I ) causes failures in naive string comparisons, whereas other Latin alphabets don't.

3) The Turkey Test gets you 90% of the way to the goal of internationalizing most apps.* We know French and Spanish are going to work. Why not test with the most difficult (but realistically difficult) locale first?

* The other 10% is excruciatingly difficult -- again, think of Arabic (bi-directional, shaped letters) or Hebrew (right-to-left).

Jeff Atwood on March 14, 2008 03:44 AM

i like the above answering post, reasoning doesn't seem out of place in the main article though.

eryn on March 14, 2008 04:35 AM

@Jeff:
they're quite unique, if they have a character where lowercase(char) != char>65&&char<90?char+97-65:char;

sorry, I just love to smite aggressive people

Aleback on March 14, 2008 04:35 AM

As a Turkish software developer, I really appreciate this post but I am sorry to see that the allegory is easy to miss. Get ready to be flamed by angry Turks.

And I wish you'd known better about Midnight Express before mentioning it even in a joke. It is simply a Hollywood exaggeration/mud.

Bahri Gençsoy on March 14, 2008 04:37 AM

I'll add Turkey to my test cases. Thanks from Sweden with ÅÄÖ posted 14-03-2008, 14-03-08 or 08-03-14 as some put it. Looking at the best before stamp on food wraps and it says 090308 scratching my head.

Jimmy Bergmark on March 14, 2008 04:38 AM

I am sure that you read,

http://www.moserware.com/, blog post:
What Does It Take To Become A Grandmaster Developer?

Nikos on March 14, 2008 04:45 AM

Typical amarican ignorance. More languages use dd.mm.yyyy than mm/dd/yyyy, Just think about writing an hour after the minute and you know why. ;-)

With number format its the same thing. If you use roman characters, use the appr. format to.

offler on March 14, 2008 04:49 AM

OMG.
In addition to java, I should learn Turkish too?

Niyaz PK on March 14, 2008 04:53 AM

Great article, as a Turkish developer you explained the problem better than me.

Hus on March 14, 2008 04:56 AM

The only really useful date format is YYYY/MM/DD because it the only one that sorts properly as text. Go South Africa!

Adam on March 14, 2008 04:57 AM

I gave a software demo in France one time only to have mass confusion five minutes unto my talk because the audience could not follow what I was doing. I had typed a decimal number, and an i18n bug would not let them do the same. To make matters worse, there was a combination of Microsoft code and custom code validating the input. The Microsoft code (correctly) disallowed periods as decimal separators on a French computer, and the custom code (incorrectly) disallowed commas. So the audience couldn't use either the French or American convention for decimals. The solution was for everyone to change the locale on their OS. They said they routinely had to do this to use American software. What an embarrassment.

John on March 14, 2008 05:00 AM

"In the United States, we would typically format today's date as 3/14/2008. In Turkey, they format it as 14.3.2008.
In the United States, we use commas to group digits, like so: 32,768. In Turkey, they group digits using a period, so the same number would be entered as 32.768."

Wow, THAT's really strange! I bet they even have the metric system!
(Like every single country except for Myanmar and the US and A).

The examples you gave above are used in many countries in Europe, so that should not
and cannot be the reason.

Chris on March 14, 2008 05:05 AM

Heh I did not know about the dotted i problem.

At least one of the problems above seems to be, though, that the code was executing in the user's locale, i.e. a string the developer put in his code was being interpreted in the turkish locale. This seems to me an ASP.Net oddity, where as a convenience you set the entire thread to a specific locale and thus suddenly the entire semantics of your original program state (without any input or output) can change completely.

For me, the proper way to deal with this would be to keep the culture of the thread set to a "sane default" (invariant culture, perhaps?) and keep an eye on converting/parsing as necessary per the user culture wherever input/output takes place. I'm not sure why ASP.Net was chosen to behave the way it does, since it seems an easy source of bugs.

wpp on March 14, 2008 05:11 AM

You just opened a can of worms :)

boz on March 14, 2008 05:13 AM

"3) The Turkey Test gets you 90% of the way to the goal of internationalizing most apps.* We know French and Spanish are going to work. Why not test with the most difficult (but realistically difficult) locale first?"

I fail to see how Turkish gets you 90% of the way, unless you can easily say that the character-set limitations and formatting rules for dates, etc. encompass 90% of the possibilities for internationalisation. I think it's naive (no pun intended) to assume that just because your program is compatible with Turkish formatting, it's going to be compatible in 90% of circumstances.

The approach to take is to not try to shortcut to internationalisation, but to actually *do* internationalisation properly. How would you say that Turkey will help you internationalise Norwegian, for example, where you can have circles above letters, etc.... or Greek, where there are plenty of other symbols available. Sure, there are common characters - but that's besides the point... that's more like specialisation rather than internationalisation.

I know you're talking about Latin-based character sets, but how on Earth is that true internationalisation?

James on March 14, 2008 05:13 AM

As a Hungarian developer, I also have fun with internationalization issues in software. Some aspects every developer could add to their checklist:

1. Some Hungarian accents (őű) do not fit into the ISO-8859-1 codepage used by most Americans, so in most cases you need Unicode to support Hungarian - at least we fit into the Basic Multilingual Plane opposed to some unfortunate cultures.

2. Academic collation order cannot be implemented without using some heuristics about the semantics. Opposed to the lucky Czechs, we historically decided to represent some of our phonemes with multiple graphemes (for example "sz" and "zs" is one sound). If you write "egészség", you need to know that it is a "sz" followed by an "s", and not an "s" followed by a "zs" to put it in the right collation order. The solution was to introduce a so-called technical collation order, which is still more complicated than an ASCII-ordering, but at least it does not need the semantic analysis.

3. The thing that gives us the most fun is that we write family and given names in reverse order compared to sane countries. My family name is "Vágvölgyi", the given (first!) name is "Attila", so my full name is "Vágvölgyi Attila". Although I am used to reverse my names in foreign cultures, for a software used in Hungary, you need to translate the format string you use to glue the parts of the name together. Even gmail fails to do this. By the way, instead putting "Mr", "Mrs" and "Ms" before the full name, we put "úr", "asszony" and "kisasszony" after it, but only in a very formal salutation.

4. We glue prepositions to the words they refer to, and sometimes we assimilate the word or the preposition. So if you thought you could translate "with" "hand" separately, you will not deliver your software to Hungary. "kézzel" is the right form, which is made of "kéz" (hand) and "-vel" (with) assimilated. And yes, we also make plurals in a bit strange way, but leave that for some other time.

My point is, that proper internationalization is a linguistic and cultural issue, and you need to get rid of a lot of assumptions if you would like to develop cross-culturally. If you bought in to the lie that simply changing an environment variable will help you making your software "speak" a given language, you will be surprised in real life.

Attila Vágvölgyi on March 14, 2008 05:16 AM

The Reason is the upper/lower case problem....The worst bug in the computer science world

Nick on March 14, 2008 05:16 AM

Only somewhat related, but a university I went to got a new library website, and it was AWFUL. It was impossible to find anything. You'd find the record you'd want and the button to display it would be exactly opposite of where you expected it to be.

I later met someone who was a developer, and found that they had outsourced the entire thing to Israel. Which wouldn't be a problem, other than Israeli's read from right-to-left, not left-to-right. Which made the design weirdness look a little less random -- the left-right orientation was pretty much exactly opposite of what I would have intuitively expected it to be.

Culture is a bitch.

Shmork on March 14, 2008 05:28 AM

Oh Man. Are you ever gonna get hammered for mentioning Turkey and Midnight Express in the same column.

That's kind of like mentioning America and...

well there's nothing bad enough to be an equivalent.

Kieran on March 14, 2008 05:35 AM

Oh I figured an analogy out.

It's like mentioning America and Abu Ghraib and saying Abu Ghraib is representative of American morality.

...

Actually, no its worse than that because in the case of America, Abu Ghraib is real, but Midnight Express is just a movie.

Kieran on March 14, 2008 05:37 AM

Hey Now Jeff,
I've seen the Turkey test since two good friends are Turkish Tulih & Volkan, who love soccer & pc's. I really liked this post.
Coding Horror Fan,
Catto

Catto on March 14, 2008 05:41 AM

"Or there will be.. trouble."

You had to sneak in a Robocop reference, didn't you?

LSnK on March 14, 2008 05:42 AM

"When he visited Turkey in 2004, screenwriter Oliver Stone, who won an Academy Award for his adaptation, apologized for the film, expressing regret that 'many hearts were broken in Turkey' due to the film."

Midnight Express is 'more violent, as a national hate-film than anything I can remember', 'a cultural form that narrows horizons, confirming the audience’s meanest fears and prejudices and resentments'". John Wakeman(ed) (1988). World Film Directors. New York: T.H. W. Wilson Co.

Kieran on March 14, 2008 05:43 AM

I'm trying to convert the world to yyyymmdd. Works in a standard numerical sort.

John Ferguson on March 14, 2008 05:46 AM

1. I Agree with the first point...the characteristics identified as Turkish issues apply to a lot of European countries (I lived in Sweden and expereinced the date and period and character issues discussed, yet some of the best software and OS work comes out of Scandinavia.

2. Stating that we know things work in France and Spain is erroneous. I did work for a large US multinational and when we were deploying world wide process control systems amongst many countries: Australia, US, Canada, Mexico, France, UK, Brazil we had real issues with the French installs, primarily due to the French version of the OS (in this case Windows on the PC clients).

The US engineers had absolutely no concept that there would be localized versions of the OS (this was in the mid 90s.

3. There was mention about using ISO standards to enter information - most of the world does, it is the US that does not: Other examples, mph instead of kph, paper size (letter versus A4, the international standard)...heck, even look to NASA and it is all imperial units, yet this is (now) part of an international consortium!!

Localization makes the software presentable, however, the underlying data storage and maipulation should be in ISO format/standards and you merely have the localized 'presentation' for user input and output (this obvious includes text).

Tim on March 14, 2008 05:47 AM

Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.

RWW on March 14, 2008 05:49 AM

Bahri Gençsoy: "And I wish you'd known better about Midnight Express before mentioning it even in a joke. It is simply a Hollywood exaggeration/mud."

Midnight Express was based on a true story, but we knew Turkish prisons were bad before the movie.

PaulG. on March 14, 2008 05:56 AM

The reason to use YYYYMMDD is not because it helps with a "standard numerical sort", but because it puts the Most Significant portion first, just like in common number printing (there may be differences between using comma and period as a decimal separator, but the places extend to the left in increasing significance. No one prints the hundreds place to the right of the ones place, do they?

Andy on March 14, 2008 06:06 AM

PaulG, you are unbelievably ignorant.

http://en.wikipedia.org/wiki/Midnight_Express_(film)#Billy_Hayes_interviewed

boz on March 14, 2008 06:07 AM

Note when in non Turkish locales you want to ignore the differences
between İ & I when changing case. These characters are normalised to
ASCII on linux at least by doing upper() & then lower().
When in the turkish locale the characters are not merged as expected.
I've used the following function on linux to transform text before comparison:

void transform(wchar_t* wcs)
{
if (ignorecase) {
/* Note this handles Turkic Case folding */
(void) wcsupper(wcs);
(void) wcslower(wcs);
}
/* Other possible transformations one could do here are:
StripDiacritics: À -> A
ConvertEnclosed: Ⓐ -> A
ConvertStylistic: A-> A
TurkicFoldCase: İ -> i
Note the above are transformations done in msort.

Note python can normalise some things also:
unicodedata.digit(u'\u2462') # ③ -> 3
*/
}

Pádraig Brady on March 14, 2008 06:07 AM

Nothing wrong with Turkey..Turkish language is different than English, that's all..
Question is; What's wrong with you?

What does Turkish prisons or Turkey as a country have to do with this problem? Who is breaking your fine software, Turkish people or the fact that your software is not localized that fine?

Cagri on March 14, 2008 06:09 AM

"Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious."

So that Americans can understand the date? ;) It is just tradition at this point, I think.

Neil on March 14, 2008 06:11 AM

>>Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious

Celebrating pi day don't make much sense when your date format is dd/mm.

Go 3/14! ;-))

Happy pi day!

Hartmut on March 14, 2008 06:12 AM

"Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious."

Convention is the reason, I'm not sure it is a good reason, but it does tend to dominate so many silly "unconventional" aspects of US standards. (I'm a US citizen if that matters.)

malachi on March 14, 2008 06:12 AM

> Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.

The best I've heard is because it's the way you say it (look at the dates on these replies). Although that's erroneous since I hear just as many people say "the 5th of March" instead of "March [the] 5th."

The US way is kind of nonsensical. The only two ways that make any sense are dd/mm/yyyy and yyyy/mm/dd. The latter is preferable for searches (as already mentioned)

Poita_ on March 14, 2008 06:12 AM

I learned quite a bit from folks who left comments on my "Turkey Test" post. One of most interesting was that in Germany, Excel CSV files use semicolons. That is, you can't look for "," but rather CultureInfo.TextInfo.ListSeparator.

Jeff Moser on March 14, 2008 06:15 AM

"Question is; What's wrong with you?

What does Turkish prisons or Turkey as a country have to do with this problem? Who is breaking your fine software, Turkish people or the fact that your software is not localized that fine?
Cagri on March 14, 2008 06:09 AM"

Cagri, no need to be defensive, your point is what Jeff is also trying to explain here.

Yas on March 14, 2008 06:18 AM

>> One of most interesting was that in Germany, Excel CSV files use semicolons.

because the decimal separator is the comma. i.e. 1.25 -> 1,25.

Hartmut on March 14, 2008 06:19 AM

If you run your .NET projects through FxCop with the standard rules turned on, it will alert you to every string parsing or writing routine where you haven't considered the locale. Easy way to help being "Turkey compliant".....

Ritchie Swann on March 14, 2008 06:21 AM

Turkish "i" character is a common problem It does not even support CCS capitalize command too.

And there are many softwares which stuck in Turkish locale which is mostly about lazy-coding.

About the Midnight Express, I live in USA, have gone to Turkey few times (by the way they are great people in general) and the movie is a bullshit similar to USA bringing democracy to Iraq.

And to remind, USA still tortures people officially from Iraq war in Guatemala.

Brian Kelian on March 14, 2008 06:21 AM

There are a whole host of BUSINESS ISSUES that aren't readily apparent when it comes to internationalization. For example, postal codes in the UK can contain alpha characters as well as digits. Many people have more than three names so the typical American first, middle, last doesn't cover all of the possibilities. And I could go on and on.

It takes a lot more than choosing invariant cultures, formatting dates, and formatting decimals correctly for a program to be truly "internationalized". You also have to make sure that the business rules are adjusted as well.

Matt on March 14, 2008 06:39 AM

Good Lord, can't we move past the referece to Midnight Express and just discuss the i18n problem. That reference was one small piece of an important topic, but instead of reading comments on how other people handle it, I am reading numerous posts about the US still torturing people, ABU Ghraib, and basically how Americans are lazy coders with no regard for other cultures...

If I wanted to read those types of comments, I'd go to CNN or MSNBC and read their blogs...

Let's stay focused and on topic.

Wayne on March 14, 2008 06:47 AM

Convention... to keep Americans happy... Pi day...
Thanks for all your answers, but I'm still looking for a _good_ reason...

It used to be the "convention" to send children to work down mines, or to make women stay at home raising a family. To me, tradition and convention don't seem like _good_ reasons to keep doing something utterly ridiculous - just reasons that don't require further thought. Oh well... perhaps I'm in the minority on this one...

As for Pi day - the only people who care about that are mathematicians and geeks. I would imagine that most mathematicians and geeks are familiar with modulo arithmetic, so why not celebrate it on the 3rd of January?

(For what it's worth, I'm a UK citizen who uses dd/mm/yyyy to comply with the rest of the country, but would rather the world standardised itself and started using yyyy/mm/dd. I also believe that we don't need any numerical seperator besides the decimal point - what is so confusing about 2000000 compared to 2,000,000 anyway?)

RWW on March 14, 2008 07:02 AM

"In the United States, we use commas to group digits, like so: 32,768. In Turkey, they group digits using a period, so the same number would be entered as 32.768."

Uh, wtf? In Spanish-speaking countries (surely more users than Turkey alone!), we also have comma and period swapped compared to English. Comma for decimals, period for thousands grouping.

Nicolas on March 14, 2008 07:13 AM

As echoed by millions here, dd.mm.yy is followed a lot in other countries. When it comes to grouping digits, remember that comma(,) is used as decimal in some european countries. There are some stuff only US follow but not many acknowledge there are others who don't follow those standards..

Besides; most of us play cricket not baseball :P

Leafy on March 14, 2008 07:17 AM

@RWW

The difference in 2000000 and 2,000,000 is readability. Is the first two million, two hundred thousand, or twenty million? Hard to tell at a quick glance, but the logical grouping into common units (tens, hundreds, thousands, etc...) makes it easier to read.

As for the mm/dd/yyyy date format, convention and common use is what brings that in. As an American, I have been taught since pre-school to write dates as March 13, 2008 on everything. So that carries over into work life and programming. Old habits are hard to break. The metric system is how old, yet we still use what we call the Standard system.

It's hard to justify to the entire country that a few programmers think date format should be changed to make it easier to write code.

It all boils down to the Secret society of Stonecutters. "Who keeps the metric system down? We Do!"

Wayne on March 14, 2008 07:26 AM

Relax everyone! When the NWO takes over there will be only one language, you will have a unique number (so don't worry about how many middle names you have), and "Globalization" will have removed the need for any of this.

This brief period in time, made possible in part by the US Constitution, where freedom to do and innovate ran rampant, will finally become under control of the world leaders, and will be found in Wikipedia as an interesting footnote in history. No doubt Googled on by your grandkids with great curiosity in their State-run cooperatives.

THX-1138.

RoboShmo on March 14, 2008 07:29 AM

And to remind, USA still tortures people officially from Iraq war in Guatemala.
Brian Kelian on March 14, 2008 06:21 AM

I think you meant Guantanamo (a.k.a. gitmo) unless im mistaken.

Arron on March 14, 2008 07:41 AM

This sounds arrogant, but I am not meaning for it to sound that way...

One last thing... For the past 40-50 years, where was the global epicenter of business? The United States. Where were many of the companies founded that create the computers and operating systems we still use? The United States. Apple, HP, IBM, Microsoft, SUN... these companies originated in the US, so naturally they will adopt the US standards. It's only been in the last 20 years that globalization has been a big buzz word. Many of these things originated in a closed system.

Wayne on March 14, 2008 07:44 AM

Hey funny you should mention date issues, just going through a problem with SQL and date/time.

Come to mention it what time zone am I posting in (see bottom of post)shouldn't it be GMT.

Coding horror fan!

Tom on March 14, 2008 07:46 AM

In OS X you could use CFStringCompareWithOptionsAndLocale, which can ignore character differences such as the Turkish I.

Mike on March 14, 2008 07:55 AM

In Britain the 'standard' system is known as imperial.

Yet it is not standard and the rest of the world thinks of america as imperialistic. How ironic

Tom on March 14, 2008 07:58 AM

Haramut wrote:
>> >> One of most interesting was that in Germany, Excel CSV files use semicolons.

>> because the decimal separator is the comma. i.e. 1.25 -> 1,25.

Actually, this is irrelevant, the CSV format provides a method to escape commas in data, the above would be "1,25" with double quotes. And double-double quotes escape allow a single double quote.

This is a Microsoft braindeadness we are seeing, where "Comma Separated Value" failes are not always separated by commas. Also, it is contrary behavior to RFC 4180 for CSV (which admittedly hasn't existed for as long as Excel's handling of CSV files).

Anyway, your premise that it is because of commas as a decimal separator is demonstratably false.

JohnH on March 14, 2008 08:16 AM

Wayne wrote:
>> The difference in 2000000 and 2,000,000 is readability. Is the first two million, two hundred thousand, or twenty million? Hard to tell at a quick glance, but the logical grouping into common units (tens, hundreds, thousands, etc...) makes it easier to read.

Not to mention in India, where readbility grouping for the number 2,000,000 would be written 20,00,000

JohnH on March 14, 2008 08:19 AM

What's the matter Jeff??? I have spent the las 6 months mastering the 'Works On My Computer' paradigm so I can get certified and now you change everything I believe in? jeez....... Now my boss will demand my programs to work on the company's server......

Yorch on March 14, 2008 08:24 AM

You can please all of the people some of the time and some of the people all the time, etc., etc. Work to make your applications accessible for your targeted audience, but know that in a world of such diversity you'll never get it perfect.

And for anyone who has sensitivity issues, please be reminded this is a blog in English, originating from a U.S. Citizen, living in the U.S. In-jokes, media references, etc. will probably be targeted primarily to a U.S. audience. So, accept that.

Additionally, stop being so freaking sensitive. Political correctness is starting to piss me off.

kenneth on March 14, 2008 08:26 AM

Do you know that America includes Mexico, Peru, Chile, Argentina and more than 20 other countries?, don't you?

BTW, only in USA a Billion is 10^9, in the rest of the world a billion is 10^12.

Eduardo Diaz on March 14, 2008 08:53 AM

Amazing coincidence I had this problem at work last week with a database system with asp.net web front end that we didn't develop. One of the web servers for some reason wasn't defaulting to British English it was one of the eastern european ones, this caused a major problem and took me quite a while to work out. The reason it was such a problem was the inversion of commas and decimals for numbers, a calculation was being performed obviously badly and coming out with stuff like -12 million.

On the case of the illogical date format in the US why mm/dd/yy it has no order of precedence in britain it's dd/mm/yy which is atleast logical smallest interval to largest.

Although I have to say the best from a programming perspective would be YYYY/MM/DD HH:MM:SS.mmm where the number order as you see it YYYYMMDDHHMMSSMMM is also the ordering of the dates.

Pete on March 14, 2008 08:54 AM

Not for nothing.. but a forum I regularly attend is constantly under attack by crackers from Turkey.

Patrick on March 14, 2008 08:54 AM

Also on dates, the main separators that are almost universal and you should expect are the slash(/), dash(-) and period(.) they are used and switched between by various people in the same country. IIRC windows even uses all of them for the US regional settings.
As others have said in most of Europe and Asian a period is used for digit grouping and a comma is used for decimal designation.

As for bad countries for handling sorting wait until you have to deal with surnames of Scotland. You have Mc, Mac,Mak,MC,M and M that are pronounced the same but have a specific sort order that is not your standard case insensitive latin order

will dieterich on March 14, 2008 09:06 AM

Nicolas: "Uh, wtf? In Spanish-speaking countries (surely more users than Turkey alone!), we also have comma and period swapped compared to English. Comma for decimals, period for thousands grouping."

WTF? Where did Jeff say "Only in Turkey". He was doing a specific comparison between two countries.

Don't get your shorts in a bunch. Learn to read. (And that goes for all of the other posters who had to chime in with "You don't know what you're talking about, Jeff. My country does it too!!!" He never said it didn't.

That's like someone saying, "Look at this! That grapefruit is bigger than this orange!" And somebody else saying, "You don't know what you're talking about. This orange grown in my country is bigger than the grapefruit." So what? The comparison was between the original single grapefruit and the single original orange, not all oranges and all grapefruits ever grown.

Use your brains, people.

KenW on March 14, 2008 09:09 AM

"The other 10% is excruciatingly difficult -- again, think of Arabic (bi-directional, shaped letters) or Hebrew (right-to-left)."

Arabic is not bi-directional. It's right to left only like Hebrew.

Abdu on March 14, 2008 09:11 AM

> When the NWO takes over there will be only one language, you will have a unique number (so don't worry about how many middle names you have), and "Globalization" will have removed the need for any of this.

Good! At last we'll all be able to agree on something! (Note: I already have a unique number, my NI number. I believe in the US you have a similar system of social security numbers or something like that.)

> This brief period in time, made possible in part by the US Constitution, where freedom to do and innovate ran rampant,

Unless of course you exist outside of the US constitution, in which case be prepared for (armed) troops to arrive to introduce/enforce their version of democracy. Good old Uncle Sam.

RWW on March 14, 2008 09:18 AM

Someone should force USA to use the metric system, ddmmyyyy and ',' as a decimal separator.

Eikern on March 14, 2008 09:25 AM

@Eduardo...

I am assuming your "You do know that America includes..." was directed at my use of the term American to describe the people in the United States of America. And, yes I realize that, but common use equates the 2. Canada is in North America, but they are referred to as Canadians. Mexico is in North America, but they are referred to commonly as Mexicans. This same thing applies for all other countries in both North and South America... But the people of the USA are almost always referred to simply as Americans. Why? Probable because "United Statians" or USAians sounds pretty silly...

And with the current state of things, I doubt many outside of the USA would like to be referred to as Americans... seems to me that many people in my own country don't even want that moniker...
When I traveled abroad as a teen, I was told to tell people I was from Canada - Very hard for a Texan to do.

Wayne on March 14, 2008 09:26 AM

OK this is a stupid comment but someone has to make it:

mmmmmmm, tuuurrrkey....

Steve on March 14, 2008 09:27 AM

I guess it is mostly no longer a problem, but Turkish also uses lower-case y-umlaut which has code point 0xFF, and some code (badly written C code, mainly) would fail to handle that, interpreting it as EOF (-1).

Anaconda on March 14, 2008 09:28 AM

Well hello,
I am subscribed to Coding Horror for some time and I was very surprised to see a Turkish flag in my reader. I am also a software developer but I work in a different branch (VxWorks) and sometimes .Net.

I think our european friends put it right, we use the same localization system with the rest of them (except UK of course). Today is 14.03.2008 and price of gasoline is 3,25 TL (hard to imagine right, we live in the middle east!) and my car mileage is at 14.000 km. Of course you can test your code by changing your windows localization setting to Turkish TR to see what happens, but any immediate problem can be solved by the help of .net globalization libraries as Jeff mentioned.(I remember that there was a Turkish-TR example there too.)

For the letters ı - i - I - İ, well every nation has to optimize their alphabeth for their language. Until 1928, we were using Arabic alphabet, I am now glad to have the headache of irregular letters mentioned in the post. I believe "Turkey Test" is a great idea to do.(Started using it when Turks become muslim in around 8th century).

For the movie "Midnight Express", I think every Turk should remember that it is a "movie" not a documetary. I think sometimes we overreact. Actually I find Midnight Express quite entertaining: The man get caught with dope, they put him away and he was raped in jail. (But according to the victim himself he wasn't raped, he was a homosexual! thank you MR.Stone!)If I can watch Prison Break or Shawshank Redemption for fun, I can also watch Midnight Express.

Kerem on March 14, 2008 09:31 AM

In "Midnight Express" the guy's girlfriend paid him a visit and opened her blouse. Dreamt about her for a week.

wackadoo on March 14, 2008 10:13 AM

About the US date format, it's all a matter of ordering to place emphasis on what we think is important. When we look at a calendar, we need to know the month first, then the day... but we just assume the calendar is for the current year.

In my experience, in the US, phrases like "the 3rd of March" aren't often used. When they are, it's usually in something formal. In common practice, we refer to it as March 3rd or March 3, 2008.

I personally tend to write dates as 2008-03-14, but that confuses some people, too.

Powerlord on March 14, 2008 10:14 AM

"Arabic is not bi-directional. It's right to left only like Hebrew."

It's my understanding that numbers are left-to-right in Arabic, whereas the rest of the text is right-to-left, so it is in fact bi-directional. This makes for very interesting text-selection behaviour...

Martin Cooper on March 14, 2008 10:22 AM

James,

>To say that Turkey is the odd-one out is to
>either be grossly misinformed, or largely
>ignorant of the rest of Europe.

I hope you're not implying that Turkey is a European country, and that Turks are Europeans.

Chris on March 14, 2008 10:27 AM

Chris,
> I hope you're not implying that Turkey is a European country, and that Turks are Europeams.

No, sorry... the inference was in relation to my earlier post where I thought I'd asserted the fact that I was British - but with hindsight, I hadn't :)

Though... with the EU, that's all set to change, anyway...

James on March 14, 2008 10:38 AM

You should link the Turkey Test Passed image back to Moser and watermark it or something to make it clear where it came from.

Also, with all due respect, I didn't really dig the title...seemed culturally insensitive as did the Midnight Express reference.

Scott Hanselman on March 14, 2008 10:45 AM

"About the US date format, it's all a matter of ordering to place emphasis on what we think is important. When we look at a calendar, we need to know the month first, then the day... but we just assume the calendar is for the current year."

Who's this "we" that you mention?... I'm not one of them! Kindly exclude me from any similar statements in the future ;)

RWW on March 14, 2008 11:02 AM

> Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.

Because in English, one says May 20, 1998 -- that is, the spoken word convention puts the month first, then the day, then the year. So , naturally becomes //.

Why English ended up *saying* the day before the month I assume results from it sounding nicer or some such.

austinjp on March 14, 2008 11:08 AM

I got bit in the ass by this last week.

I was writing some calculations for a website and for the life of me I couldn't figure out why it was returning odd errors.

About two hours of debugging later, I realized that my culture code was set to traditional Chinese which uses the . as a , in number formatting.

At least I learned a valuable lesson there.

Ryan Smith on March 14, 2008 11:23 AM

Scott H. -- cut him some slack, it's the first day of his new career.

As for sensitivity -- people could write all day about San Quentin or movies made about it and I could care less. Chill out.

I'm just pissed because he's got me thinking about all that hashish...

wackadoo on March 14, 2008 11:33 AM

> Because in English, one says May 20, 1998 -- that is, the spoken word convention puts the month first, then the day, then the year. So , naturally becomes //.

Rubbish. I'm English and I've never said "May 20, 1998", but always "20th of May 1998". In your mutilated version of our language, you may say it that way - but that's it.

James on March 14, 2008 11:37 AM

Someone said "... opened a can of worms" and another said "...seemed culturally insensitive..." Isn't this a standard blog technique? Make potentially inflaming statements and you will draw people in. Just yesterday there was an article somewhere that indicated that people are very attracted to blog _comments_ simply because people can be misinformed, misunderstood, misunderstanding, and write out of passion than out of consideration.
I don't believe Jeff anywhere said a negative thing about Turkey specifically. The movie reference is just a _movie_, this is just a _blog_. But it is easy for someone to take as a slight once one culture is compared to another.
How many British or British English programmers hate that they have to spell colour as "color" for 'merikan programming languages?

Seraphim on March 14, 2008 11:43 AM

This is not specific to Turkey. Me, as a Romanian developer, I can confirm that all these cultural differences are present in our culture too.

US is simply, like Great Britain, an island of non-standards such as using a thousand of measure units for length and so on. We (Europe) use one.

Andrei Rinea on March 14, 2008 11:46 AM

What is it with country names and food?
Turkey? Chili? Hungry?

KG on March 14, 2008 11:56 AM

@Wayne, as chilean, I was taught that also I am american, too.
Also in Peru, peruvian consider themelves american, so people in Argentina, etc.

In general we refer to us as "americanos", we say "soy americano" (I'm american) or "somos americanos" (we are american).

I suggest to read: http://www.dcc.uchile.cl/~rbaeza/inf/american.html

This is a cultural issue, off course, so you must consider it when doing internationalization.

Also is a political issue, some people see the use of the american word as a sign of arrogancy (and sometimes imperialism) of the US citizens...

"...And with the current state of things, I doubt many outside of the USA would like to be referred to as Americans... seems to me that many people in my own country don't even want that moniker"

Well, for me "american" is not an ugly moniker, because when I read american I dont think on a US-citizen. You are still thinking in american as us-citizen only, think globally!

In fact, people in the rest of america eventually could call you "gringo" or "yanqui", rather than american (think about it).

I like this blog, and don't want to start a flame war (or a political war), just to stablish that if you want to talk about internationalization you must think in political and cultural issues too.

Have a nice weekend.

Eduardo Diaz on March 14, 2008 12:03 PM

As a Scandinavian, the thing that bothers me most is web sites that ask for input in UTF-8, but clearly can't handle it properly.

My local time is 20:00
My local date is 14.03.08

And I write ten million as 10 000 000,00 or 1.000.000,00

N on March 14, 2008 12:05 PM

Sorry, I made a mistake in my previous post

I write ten million as

10 000 000,00 or 10.000.000,00 (Local time: 20:07)

N on March 14, 2008 12:07 PM

As an ancient Roman, I am highly offended by your callous reference to "movies about gladiators".

Anonymous Cowherdius on March 14, 2008 12:21 PM

"Thanks for all your answers, but I'm still looking for a _good_ reason..."

Because there's no reasonably conversion path from mm/dd/yyyy to dd/mm/yyyy. Given a piece of data, you can't tell whether it is pre-conversion or post-conversion. You would need to convert from mm/dd/yyyy to something like yyyy/mm/dd. And get the Europeans to convert too.

(Of course, the real reason is because the gains from switching do not outweigh the costs.)

Scott on March 14, 2008 12:42 PM

@Eduardo et. al

It's a matter of hierarchy, one looks for the most granular/sensical description of your nationality...I for instance am an Solarian/Earthling,Northern Hemispherian,Northern American,American,Californian,Orange Countyian,Costa Mesaian,Fairview Streetian...I should just call myself a Fairview Streetian but you'd most likely have to be a Costa Mesaian to understand that, and then an Orange Countian to understand *that*. If you walk back up the tree to American, that name would have the greatest chance of understanding regardless of your own locale. So while culturaly you may consider yourself an American, you under this easily understood naming convention would be a Chilean. It's by no means an attempt at minimizing your American'ness it's simple a matter of facilitating communications.

This whole globalization thing is a fad anyway...it's all going to implode and we'll have regional standards only again. This is the internet's biggest downfall, global voice but local context.

Ryan Smith on March 14, 2008 01:38 PM

BTW...this is the OTHER Ryan Smith...sigh see naming is such a pain...

Ryan Smith on March 14, 2008 01:39 PM

Most of the cases discussed in this article are ordinary internationalization issues but Turkish I problem is something different and interesting. This especially plagued java applications. Whenever String.toUpperCase() or String.toLowerCase() methods are used, code that contain "id" fails. If the string that is converted is not for the user and related to application logic, you must use these methods according to English locale. There is a detailed article about this at http://java.sys-con.com/read/46241.htm .

Also whenever a search query is made in an multi-language content (e.g. a music archive that contains both Turkish and English albums), you must assume that all the versions of ı, i, I and İ must be treated as the same character.

Bilgehan Maras on March 14, 2008 01:44 PM

Over here we use DD/MM/YYYY, and we use commas for decimals and we usually don't group digits, but when we do, we use dots. And yes, we use the metric system. So, Turkey isn't THAT unique. But that wasn't the point of this.

BTW, in Argentina we call us people "yankis" in formally and "estadounidenses" or "norteamericanos" informally.

Alejandro on March 14, 2008 01:46 PM

Well, you see, the Turkish flag appears to be a C and a star, meaning their country's programming language is C*, whose last revision was 14 years ago. This is a matter of old, depreciated code!

Vaaal on March 14, 2008 03:46 PM

This post was a Turkish Delight until I read all the America bashing in the comments.

Matt on March 14, 2008 03:56 PM


Rationalizing mm/dd/yyyy ...

This makes sense when you are primarily talking mm/dd. As in, "March 3". The most important thing is the month. It should go first. People who say "3 March" are just plain silly and inefficient :)

The "/yyyy" at the end instead of at the start is obviously a historical error, presumably as logical as Intel putting the most significant byte in the bass-ackwards position in x86.

Tom Dibble on March 14, 2008 05:09 PM

Haaa. Jeff, you really like to stir the pot and getting your fingers burned! I do take issue with your characterization of Midnight Express as a credible source for your opinion--and many commenters here pointed out why (it's basically wrong as a historical document) -- but I laugh about it because I sense that you are being facetious there. No one in a right state of mind understand anything about Turkey from that one movie. You are right about using Turkish as a test case for internationalization, and it's good to see my country's flag here.

picardo on March 14, 2008 05:58 PM

I guess I'm atypical of Americans (US Citizens, sorry)

Today is 15MAR2008. Unless, of course, I need to do a string sort on it, then its 2008MAR15.

I write 2,000,000 as 2,000,000, 2.000.000 and 2 000 000, depending on my mood.

Binarycow on March 14, 2008 10:26 PM

In USA date are written mm/dd/yyy because they way they speak. While one American stated that is not his case, does not qualifies his opinion as what is used the most, just his own. Depending on which state us USA you go, they all speak in different ways (just like in any other country).

The way this makes sense can be understood by observing other expressions like "he is a city person" instead "a person of the city". Both sentences are similar but they can have different meanings. "a city person" is someone who grew up in the city, in contrast to "a person of the city" who could be a country person currently living in a city.

While the meaning in the dates does not change using either form, it is the form of speech that mandates it's written form.

As an exercise let me compare this sentences to Spanish:
"racing car" Vs. "auto de carreras" (car of racing)
"electric pump" Vs. "bomba electrica"
"May 2nd" Vs "2 de Mayo"

Americans keep telling me "You people speak backwards!"

Then I came to learn that a million and North America have a different meaning in this country and many more (http://en.wikipedia.org/wiki/Continents).

USA has done efforts to change. Every commercial product in USA state the contents on Imperial AND Metric numeric systems. It is just hard to change old habits. Not to forget the economical impact in the industry that such changes represent.

The point of the article is simple: You go global, then be aware that each country is different. Plan ahead!

Ricardo on March 14, 2008 10:34 PM

@Vágvölgyi: "My family name is "Vágvölgyi"" Sounds funny, a typical Finnish could pronounce it "Vakvölkky" or even "Völkky" (which leads to "Pölkky" that means a piece of tree trunk), because our language is pretty straight forward, not filled with intricasies like your language.

"with" "hand": Also we don't set different words after each other, but we bend the words where the base of the word also changes a bit. "Käsi" means "Hand", but "Kädessä" means "At hand". But there are exceptions: "Car - In car" is "Auto - Autossa" not "Audessa". That makes it somewhat more difficult to build translators and other logical text manipulators.

@The pitfalls of software development: Line break character, date time formats, and comma/period problem. These things are very small issues, but they cause really much headache and hassle. Someone should do something about them... And we need a line break key in the keyboard that represents a line break without breaking the line before the text is presented to the user. And the am/pm? Why can't you use 20:00 for 8:00 pm? That would be really easy.

Don on March 15, 2008 02:20 AM

The reason to use YYYYMMDD is not because it helps with a "standard numerical sort", but because it puts the Most Significant portion first, just like in common number printing (there may be differences between using comma and period as a decimal separator, but the places extend to the left in increasing significance. No one prints the hundreds place to the right of the ones place, do they?
Andy on March 14, 2008 06:06 AM

Oooh, swing and a miss. In normal numbering, the larger digits are more significant, because you don't normally care about the smallers.
If you were doing a post count, it wouldn't be vital that there were 106 rather than 100, for example.
In dates, it's the other way roung. You rarely need to know the year, because by that time the data's obsolete, but you constantly need to know the day or month.
Imagine if the only detail you had was that these were all posted in 2008. That's not valuable information.

Tom on March 15, 2008 02:33 AM

Except RTL setups, Turkey Test IMHO will cover most of the internationalization issues in your code. The big and the priceless one is ıİ problem, however. Here are two quick examples in two well-known software I have encountered in time:

Visual Studio (SDK Actually)
Here is the bug I submitted just three days ago:
VSCT compiler incorrectly changes the case of the path of additional include files when the regional options set to Turkish, resulting in compile time errors
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=332672

The thing is, a freshly created DSL Tools project does not even compile on VS if the locale is Turkish :)

Resolution? Create C:\Program Fİles\... and copy some files to make sure they're found.

SubText (and SQL Server)
I was evaluating blog software, and was really interested in SubText. I couldn't be able to execute the setup, however, since all the DB schema creation scripts were broken on my setup. If you define a column named "id" and refer it as "Id" in a query (or an SP schema etc.), SQL Server chokes big time.

The ıİ problem is the tornasole paper of the internationalization quality of your software, if you're interested. Yes, other languages have other special characters, and so does Turkish. We have also "ğĞüÜşŞçÇ", but the problem is only in Turkish, AFAIK, a standard English letter has a different UCase and LCase representation.

The thing is, we have spent millions of millions of dollars worth of time here in Turkey during the last 10-20 years to solve simple issues like this one. And it would be great if you, the fellow developer, spend a little more time to write international-aware software, if it's applicable.

Jeff,
Thanks a lot for mentioning these issues. Will help in the awareness front.

The movie is a thing of the past, actually. It might be a true story or not, I don't know or care. The thing was, It was the only movie about Turkey back in the time, so it was almost impossible for us to explain ourselves to outside world when there is such a movie from a well-known director.

For years, it was (and it appears that is, still) always one of the first topics on the table when you met a foreigner and say "I'm from Turkey". It's event the only non-technical reference you can come-up with to cheer up the post:) In a blog post titled "What's Wrong With Brazil?", I'm guessing that anyone's non-tech references would be Adriana Lima, samba, carnival etc., and I'm sure bad things also happen there, as in everywhere. It appears that, the movie in question will haunt us for a long time, which is very sad.

Gökhan on March 15, 2008 04:07 AM

> Imagine if the only detail you had was that these were all posted in 2008. That's not valuable information.

It sure beats only knowing they were written in March?

--

There's a difference between internal dates and presentational dates. Internally dates *should* (imho) be stored according to the ISO standard: YYYYMMDDHHMMSS. At work I fetch files from an american business, they name their files 03152008.zip - thank you very much for making it easy to sort them with ls. When presenting dates it should be unambiguous, preferrably with the month presented in text and the year with four digits.

As a swede todays date could be written as 15/3-08 or 15 march 2008 or 2008-03-15. I usually group big numbers with space: 20 000 000.

Berserk on March 15, 2008 04:10 AM

> presumably as logical as Intel putting the most significant byte in the bass-ackwards position in x86.

Uh, that one makes perfectly sense, there are (or maybe more were, with older less powerful x86 CPUs) cases (esp. extreme optimizations) where you might want to use the same variable as 8, 16, 32 or even 64 bit variable - e.g. because you know at that stage it fits into 8 bits, or because you need the value modulus 256 or.... With little-endian they all have the same address, with big endian you must adjust the address each time.

Reimar on March 15, 2008 04:30 AM

Oh, and on the topic, I still can not understand why all languages default to the internationalized string functions.
Most of the code will be internal data manipulation that is more likely to need something ASCII-compatible, exactly that internal code is _much_ more likely to break very seriously by the string functions changing their behaviour than the display code (there e.g. using ASCII instead of Turkish localized ones would only make things look weird, the user would not have any real problems, and in the worst case the real problems would be noticed very quickly and easily).
And lastly, just converting those few basic string functions will not get you really closer to making your program work internationally.
So why the hell could someone consider making the localized, non-ASCII string functions the default a good idea? Even more when creating a completely new language, they could have easily make .ToUpper have a well defined, never changing behaviour that could be used for e.g. config files that never change and any other logic, and have a e.g. .ToUpperL (L for localized) for anything that deals with user stuff.

Reimar on March 15, 2008 04:40 AM

John Ferguson: "I'm trying to convert the world to yyyymmdd. Works in a standard numerical sort."

But this will break in less than 8000 years. Why are developers always so short-sighted?

Vinzent Hoefler on March 15, 2008 04:54 AM

Just a thought - the 14.3.2008 format is used in places other than Turkey too. For example, um, in India. Really, this is a case of USA-majority programmers thinking that theirs is the only way things are done.

Sumesh on March 15, 2008 09:05 AM

Wow, if ever a US programmer wanted an excuse not to care about internationalization, the whiny replies to this post certainly provide one. It seems that instead of being helpful, most people prefer being indignant and insulting, especially if they can tie that in with some good ol' fashioned Merika bashing.

I like Windows because I can set the Internationalization to the US, but still set the date to YYYY-MM-DD. I think this is some ISO standard. I've used it since Y2K. But I've found that it is also completely unambiguous. I've never encountered anyone who has confused the order of MM-DD in that string.

I stay away from YYYY/MM/DD because the slashes seem to either sometimes confuse people, or tempts people to shorten it to YY/MM/DD, which sucks because that makes three different orders with slashes in them. I've also seen software that sees the dashes and gets YYYY-MM-DD right, but sees the slashes and gets YYYY/MM/DD wrong.

I've also found that when you put YYYYMMDD in filenames without dashes or slashes, somehow people can figure that out for the most part, especially if you are dealing with years in the 1970-2010 range. But no matter what the separator, you get free file sorting by date if you lead your files that way.

mpbk on March 15, 2008 09:24 AM

> Really, this is a case of USA-majority programmers thinking that theirs is the only way things are done

I don't think it's quite so insidious-- it's another case of Works On My Machine, really

http://www.codinghorror.com/blog/archives/000818.html

Jeff Atwood on March 15, 2008 12:07 PM

There also is the String.ToLowerInvariant() Method.
From MSDN:
If your application depends on the case of a string changing in a predictable way that is unaffected by the current culture, use the ToLowerInvariant method. The ToLowerInvariant method is equivalent to ToLower(CultureInfo.InvariantCulture).


Yacine on March 15, 2008 02:19 PM

C'mon computer people, we can do better than that. Let's just use a system like IP with the most significant part on the right.

seconds.minutes.hours(24hr).day.month.year

BTW: it is 0.0.22.12.3.2008, do you know where your kids are?


Of course we could always use the Integer based seconds since 1/1/1980 system, I am sure we could convince the technophobes to switch.

JohnFx on March 15, 2008 02:56 PM

"Of course we could always use the Integer based seconds since 1/1/1980 system"
I thought that was 1970. (At least in Java, it is, and I think, and Hope, that it s the same in all programming languages)
But then you have this (From Microsoft Excel:Mac Help):
Excel supports two date systems, 1904 and 1900. The default date system for Microsoft Excel for the Macintosh is 1904. The default date system for Microsoft Excel for Windows is 1900.

And I wonder why it isn't 1000.

Nivas on March 15, 2008 04:17 PM

Joel Spolsky has mentioned the reasons behind the 1904/1900 distinction in Excel a couple of times:

http://www.joelonsoftware.com/items/2006/06/16.html
http://www.joelonsoftware.com/items/2008/02/19.html

Dave W. on March 15, 2008 05:19 PM

boz: "PaulG, you are unbelievably ignorant."

Thanks for the Wiki. I stand informed and corrected (but never "unbelievably ignorant"), and only regarding the movie. Turkish prisons are still bad now, and were bad before the movie.

Stay on topic and save your ignorant insults for someone who deserves it.

PaulG. on March 15, 2008 08:33 PM

I'm sick and tired of all this Euro-centric old-world cultural imperialism! In America, we use the Standard (or "Imperial", for you limey bastards) System of measurment because it's part of our cultural heritage (thanks to the aforementioned limey bastards - long live the British Pound!). What right does anyone have to try and tell us that we're "wrong" when all we're doing is upholding the traditions and cultural values of our ancestors? Just because our history doesn't go back for centuries and centuries of inbreeding, religious wars, and the divine rights of kings, doesn't mean we don't know a thing or two about the price of rice in China. There's a reason we kicked you a**holes out of the New World and that reason is called The Metric System, which is really just a crutch for people too stupid or lazy to count past ten. As for dates, everybody knows that the only accurate way to store date values is as an integer representing the number of seconds since January 1st, 1970; so all of your so-called localization issues are really just UI skins for the data layer, which any half-competent programmer can implement for the intended audience. My point is this: if you're going to pirate American software, the least you can do is learn the proper American string-formatting for date values.

As for all the Turkey Apologists: regardless of the relative merits (or lack thereof) of Turkish Prisons, let's not forget that you can be remanded to one for simply "Insulting Turkishness". I won't pretend to know the precise definition of "Turkishness", but if you want a shortcut that will allow you evaluate the conditions inside of Turkish prisons firsthand, all you have to do is mention the Armenian Genocide of 1915 (certain historical facts are apparently contrary to the concept of Turkishness). Or you can just ask Elif Shafak, or Orhan Pamuk, or Hrant Dink (no, wait, you can't ask Dink because he was assassinated by a self-proclaimed Turkish nationalist). At least in America, despite our mm/dd/yyyy ways, we can tell the difference between fiction (Midnight Express) and wholesale slaughter (the Armenians). I can't decide who deserves each other more: Turkey, or the EU.

Proud Yanqui on March 16, 2008 12:11 AM

"you could politely ask them to enter data in ISO international standard format instead."

I don't do much web development, but I was under the impression people could specify their preferred culture in most modern browsers, and you can use that to parse locale specific information. I'm sure ASP.NET has things to help you do that.

""INTEGER".ToLower(System.Globalization.CultureInfo.InvariantCulture)"
Or the shorter "INTEGER".ToLowerInvariant() in the specific case of strings. But you can pass InvariantCulture as an argument to lot's of objects ToString methods.

I'm moving over to YYYY/MM/DD because I find it to be more unambiguous. No-one (afaik) uses YYYY/DD/MM so you can't get muddled up. And I find days and months to be equality important, sometimes I want to know what day and sometimes the month. In a case of equality here surely it makes sense to place it in order of magnitude instead ;)

As useful as it is, it irritates me that the InvariantCulture is so American centric (see dates).

[ICR] on March 16, 2008 03:42 AM

I think you slipped here.
You sound like an american who is the center of the world.
Many european countryes use another date format than america.
And the dot vs comma is the same...this is rather pointless wining, i could say the same thing about americans "why do they write their dates like that" and that is why i put always the date format behind the field (YYYY.MM.DD for example)

mardicas on March 16, 2008 07:03 AM

What an ignorant post.

Nic on March 16, 2008 08:00 AM

Jeff's not whining, he's point out that Americans have to take this into consideration if they expect their software to work outside of the US. Guess what? Most Americans don't know a whole lot about the myriad of ways in which other countries do their date formats and things like that. Guess what? I bet most Europeans only have a limited knowledge of the same thing. Ease up.

Shmork on March 16, 2008 09:13 AM

You're all lucky to not have to deal with asian internationalization.
Japan and China have more than one local within a single program, and quite frequently this happens.

boredguy on March 16, 2008 09:13 AM

So ein Käse @ Yanqui

Welche Traditionen?
Nehmen wir mal Washington:
20,9% der Einwohner sind deutscher Abstammung und stellen damit die größte Gruppe. Es folgen die Gruppen der Englisch- (12,9%) und Irischstämmigen (12,6%). Hispanics sind mit 9,1% und Asiaten mit 6,6% der Bevölkerung noch zahlreicher als die Norwegischstämmigen (6,3%). Der Anteil der Afroamerikaner liegt mit 3,4% deutlich unter dem nationalen Durchschnitt. 1,5% sind Indianer und 0,4% der Bevölkerung stammt von Bewohnern der Pazifikinseln (wie z.B. Hawaii) ab. Kurz: Redmond ist zu 30 % deutsch..

Das heißt: gerade mal die Tradition von unter 13 % würden die verkorksten Datenformate rechtfertigen.

Das mit den Sekunden ist der größte Schwachsinn überhaupt und nur bei Unixfans gerechtfertigt. Das zeigt deutlich, wie ewig gestrig dieser Beitrag nur gemeint sein kann.

Und zu Werten:

1 Kilometer = 1000 Meter
1 Dezimeter = 0,1 Meter
1 Centimeter = 0,1 Decimeter, 0,01 Meter
1 Millimeter = 0,001 Meter
1 Nanometer = 10^-9 Meter

Die kleineren wirst du kennen. Der rest wie Meilen ist Mischmasch, der nicht mal einheitlich ist.

1 Tonne = 1000 Kilogramm
1 Kilogramm = 1000 Gramm
1 Pfund = 500 Gramm

Sollte auch nicht so schwer sein.

Zeit und Zahlen:

Sonntag, 16. März 2006 17:29 Uhr

1 Eins
10 Zehn
100 Hundert
1.000 Tausend
1.000.000 Million
1.000.000.000 Milliarde
1.000.000.000.000 Billion
1.000.000.000.000.000 Billiarde
1.000.000.000.000.000.000 Trillion
1.000.000.000.000.000.000.000 Trilliarde

1,489 = Preis in Euro für einen Liter Super.

Viel Spaß beim Nachdenken.

Weißenberg on March 16, 2008 09:31 AM

Jeff, please accept my apologies for being a troll on this forum. What seemed like the height of biting satire last night (after a few too many beers, admittedly) just seems like buffoonery this morning. You could even delete these comments if you like, as they contribute nothing useful to the discussion. Thanks,
Sheepish Yanqui

Proud Yanqui on March 16, 2008 11:43 AM

very interesting issue. i'm from switzerland and we even have stranger formats. e.g. although we are in europe (not EU), we write
16.03.2008
and
2'000'000.00

if you set your osx to the swiss locale, the built in calculator will produce wrong results when calculating with floats. it won't let you use coma as the decimal divider but will fail if you use a dot. (it will produce wrong results!) The only solution to this is to delete the swiss localisation from the calculator.app bundle.

as for all the discussion about turkish prisons and amerika-bashing... c'mon, aren't we over that? don't we all know that nobody, neither turkey, rest of europe and also the us do have a dirty west when it comes to human rights and democracy? and, as i experienced it, when i worked with us-americans, it isn't that polite and common to speak about politics in professional environments. i guess there's a good reason for this!

extrapixel on March 16, 2008 01:03 PM

Turkish people can easily say whats wrong with america according to this blog... I read the whole thing but didnt make sense to me even one sentence. answer is simple... its because you are comparing US with Turkey... which means u r comparing two countries which maybe you shouldnt but instead you could just accept them as the way they are.

btw Midnight Express is just Hollywood. even the guy himself explained this many many times... go to youtube and search for his own video.

ahmet turan donmez on March 16, 2008 10:14 PM

America is the biggest player in the software industry (as well as many industries), so that it should be natural to accept its ways as standards, at least when culture doesn't impose anything.

For instance, I would sure drop my French habits of writing numbers 123.922,023...

Unfortunately, one some topics, American standards are so bad (e.g about dates and units), other countries (you probably now know that you were totally wrong about talking about the turkish exception) will never accept them.

poule on March 16, 2008 10:24 PM

heh, all the imperial/metric talk reminds me of the leading story on this satirical website.

http://theothernews.co.nz/archives/issue1/page1

Chris on March 16, 2008 10:25 PM


It is interesting to see that Turkey flag on a blog I often read.
This localization things make us crazy as Trkish developers occasionally. Workarounds, troubles, amazing bad results, etc.etc.

Anyway..

BTW, I think Jeff did mention Midnight Express thing to get more attention, especially from Turkish developers.

It's funny movie as Kerem says but the thing is funnier than this is many people know / imagine Turkey according to that movie.

Cheers,

Mehmet.

afsinbey on March 17, 2008 12:51 AM

@poule: no it isn't . In the meantime more software and hardware is produced in India, China and the Tiger states. US is the biggest player with taking on new debts. Software for clinical use is except of GE mostly not made in US. Maybe they don't trust in US software?

offler on March 17, 2008 01:08 AM

Guys, I dunno what's the problem, but here in Russia we format dates the same way they do in Turkey, only we have a space as a group delimiter. We also have a habit so set up our browsers to our preferred (Russain) locale, like most of normal people are used to do. So what? None of the good blogs or web apps rejects our locale or neglects, so the problem is not in Turkey or any other country or their locale or uppercase peculiarities - the problem is in programmers, who do this strange uppercases to variable type names (why would one need it and how can it matter?).

rimmer333 on March 17, 2008 01:54 AM

@Proud Yanqui
"What right does anyone have to try and tell us that we're "wrong" when all we're doing is upholding the traditions and cultural values of our ancestors?"

That's a good point for me to remember when mocking you guys :D
That's true though. I think it's rediculous for culture's to standardise, it's a culture after all. That's why we do internationalisation in the first case. So keep writing your dates in confusing formats :D I understand

TrXtR on March 17, 2008 02:57 AM

Where are the good old days of spawning code and sites and gizmos without a care in the world. And now all at once we have to consider that someone wants to use our stuff!
And what is worse: 'the' user doesn't exist. And he certainly isn't the power-user he should be to understand our reasoning.
Now our sites have to be Turkish-proof (great country, loved to work there, but hated typing reports on a Turkish qwerty keyboard) Fool-proof, Granny-proof (like that one as an idea), accessible, usable, useful ...

Someone mentioned postal codes? I have given up hope on those ever getting standardized (you know like one number meaning one community and vice-versa) in my own country (Belgium), let alone worldwide.

A tradition is an answer to a question that has been forgotten. That's why RWW will never get a serious answer.

Klara on March 17, 2008 03:21 AM

A lot of people blame America, lazy programmers, bad libraries, etc. Blame blame blame.

Internationalisation and globalisation are difficult. There are some terrific tools out there, and some things are dead easy. However, like all programming, it's also far too easy to have 1 line of code destroy all the other hard work in a program. Not supporting my video card is annoying, but that 1 stupid defect will enrage people because, of all things, it neglects their culture.

And we all know that people are very touchy.

hmmm on March 17, 2008 03:25 AM

As I read through the postings it occurs to me that there is even distribution of idiots from every culture. How is that for internationalization?

Great posting, Jeff.

Vadim on March 17, 2008 05:12 AM

Interesting and informative post, however from a programming point of view (and not from a political point of view - that would be an entirely different debate, more suited for many other blogs, and unfortunately found in probably over half the comments in this blog), there is nothing WRONG with Turkey.

There IS something DIFFERENT with Turkey, something that some (not everybody develops software for the international -let alone national- market) software developers need to be aware of.

punkALARM on March 17, 2008 06:14 AM

re:Hartmut

We, Turks, approximately celebrate Pi day on July 22.

Turkish-Bostonian on March 17, 2008 08:51 PM

This thing haunted us for a week until we figured it out.
We learned something from it. Never use Java reflection if you want to sell to turkey. Especialy don't use properties starting with a lowercase "i" in its name...

The funny thing is, this behaviour is not only coupled with the language setting of your machine, but also with the timezone!!! So if you have an english Windows XP in the turkish timezone, then you'll get this trouble.

Greetings from Germany,
the home of the ÄÜÖäüöß's and beer,

Lars

Lars on March 18, 2008 05:34 AM

It seems its the first time Jeff Atwood has even encountered any language other than English. If he had done, so he would have known that each language has its unique alphabet letters and each culture can have its own way of things just like US.

Thus problem mentioned here does not have anything to do with Turkey. Thus it is wrong to say what's wrong with Turkey. Instead what we should ask is that why does not every country follow the only great US standards.

me on March 18, 2008 05:52 AM

One of these days the W3C will get around to ratifying the tag.

Ryan C Smith on March 18, 2008 01:14 PM

Doh...that would be the tag

Ryan C Smith on March 18, 2008 01:15 PM

Silly html filter... nevermind...joke is verging on the ridiculous now.

Ryan C Smith on March 18, 2008 01:16 PM

I once stumped a friend with this riddle:

You speak only English, but you know three words in Turkish. What are they?

I finally had to tell her:


Towel

Bath

Border.

Keep up the good work.

Tomato Queen on March 19, 2008 07:47 AM

These people actually program? I thought they just pray all day :)

hxr on March 20, 2008 11:50 AM

hxr: I guess you're just another ignorant then :)

me on March 20, 2008 02:37 PM

So you have realized that other countries exist besides the United States? Wow! That's nice.

deadcabbit on March 21, 2008 04:14 AM

I'm not from US, but from some pretty small country. And yeah - it was a joke ;)

hxr on March 21, 2008 05:58 AM

A nice i18n topic. I wish I have seen it before all the unnecessary flaming comments.

Gürkan Yeniçeri on March 25, 2008 02:51 PM

"BTW, only in USA a Billion is 10^9, in the rest of the world a billion is 10^12."

Well, not only in USA. In Brazil a billion is 10^9.

Daniel on March 26, 2008 03:36 PM

A thought provoking article and comments.

Leads me to think that it is impossible to target the whole world with an application, there will always be something missed. For example, I live in Adelaide, a city 9.5 (nine and a half) or 10.5 hours away from UTC/GMT. There are heaps of programs and hardware gizmos (eg my ADSL modem) which don't allow for half-hour time offset.

A comment on the idea of the American date format evolving from "tradition": rubbish. It has always been a bugbear of mine that so many US standards have departed from tradition or existing standards for no good reason. Why is there a US gallon? Why did early modems use Bell standard tones instead of CCITT? Truth is, Americans revel in doing things differently for the sake of it, to hell with standards. (And Microsoft make an art form of it.)
The US date format makes for lots of confusion when working for an American company in Australia, that's for sure.

As for the German post, the metric measure of distance is "metre", not "meter". Good thing you didn't give examples of volume, where you might have made the same mistake. Shame on you if you are European.

:)

Ken on March 27, 2008 04:12 PM

@Aleback:

"they're quite unique, if they have a character where lowercase(char) != char>65&&char<90?char+97-65:char;"

Unique? Yes. Odd? No.

Modern Turkey adopted the Latin alphabet in 1928, way before ASCII was born (1960?) and before ordering the Turkish alphabet such that the uppercase/lowercase 'i' would be exactly 0x20 characters apart could even be considered :)

I'll try to grossly(*) explain the logic behind this seemingly backwards mapping between the uppercase and lowercase 'i':

In Turkish, the 6 vowels can be grouped in pairs of "thick" and "thin" sounds:
- A (thick), E (thin)
- I (thick), İ (thin)
- O (thick), Ö (thin)
- U (thick), Ü (thin)

With the exception of 'A' and 'E':
1. When you take a "thick" vowel and adorn it with dots, it becomes "thin".
2. The alphabet is ordered such that each "thick" vowel is followed by its "thin" version:

A, B, C, Ç, D, E, F, G, Ğ, H, I, İ, J, K, L, M, N, O, Ö, P, R, S, Ş, T, U, Ü, V, Y, Z

a, b, c, ç, d, e, f, g, ğ, h, ı, i, j, k, l, m, n, o, ö, p, r, s, ş, t, u, ü, v, y, z

The lowercase of 'I' being 'ı' and of 'İ' being 'i' is perfectly consistent in terms of the phonetic makeup of the Turkish language -- and not necessarily consistent with spacing uppercase/lowercase letters 32 bytes apart in ASCII (*AMERICAN* Standard Code for Information Interchange) ;)

(*) I have no recollection (from school) of the correct terms I should be using instead of "thick" and "thin", but I hope I made my point.

Ates Goral on March 31, 2008 03:31 PM

This American developer has been encouraged to note that all the anti-American/Waah-You-Guys-Are-Imperialists negative comments have been written in English.
I would think the irony there would prevent bitter developers from posting comments.

Mike on April 5, 2008 10:59 PM

The dot of the i is very important for us and it is very different. If you say "sıkıldım" it means "i am bored", if you say "sikildim" it means "i am f**ked". It is also important for the software development. You must be very carrefull about that letter. I always prefer to use "I", the uppercase of this letter so there is not a problem.

Someone from Turkey on April 9, 2008 12:43 PM


(For what it's worth, I'm a UK citizen who uses dd/mm/yyyy to comply with the rest of the country, but would rather the world standardised itself and started using yyyy/mm/dd. I also believe that we don't need any numerical seperator besides the decimal point - what is so confusing about 2000000 compared to 2,000,000 anyway?)


I agree with you on the date thing. But please come to Zimbabwe. I just got $ 18750000000.00 put into my account. I then transferred $ 15000000000 to my mother. Paid my golf subs, $ 1840650000.00 and my security company $775000000.00. By the way a coke now costs $ 85000000.00. Now with separators
$18,750,000,000.00
$15,000,000,000.00
$ 1,840,650,000.00
$ 775,000,000.00
$ 85,000,000.00

Yah I agree with you entirely. Numerical separators are completely pointless and we should get rid of them.

(btw, the 18 billion dollars is about £75

John on April 17, 2008 02:39 AM

Who would have thought localization issues could turn deadly?

http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

Mike Powell on April 21, 2008 10:54 AM

The same incident Mike is referring to was also written up quite well at the Language Log:

http://languagelog.ldc.upenn.edu/nll/?p=73

Jeff Atwood on April 22, 2008 02:11 PM

I'm not the only one that immediately thought of this post after reading the Gizmodo article, huh? For those that aren't willing to click the links Mike and Jeff left above, it is about a Turkish separating couple which was arguing over SMS. The guy sent a message where a SINGLE CHARACTER got mangled and caused the meaning to change from (this is a translation obviously) "you change the topic every time you run out of arguments" to "you change the topic every time they are f****ng you". She told her father who became enraged and when the boyfriend went to apologize, a stabbing match ensued.

Neil C. Obremski on April 22, 2008 02:23 PM







(hear it spoken)


(no HTML)




Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.