Software internationalization is difficult under the best of circumstances, but it always amazed me how often one particular country came up in discussions of internationalization problems: Turkey.
For example, this Rick Strahl post from mid-2005 is one of many examples I've encountered:
I've been tracking a really funky bug in my West Wind Web Store application that seems to crop up only very infrequently in my error logs. In a previous post I mentioned that I had instituted some additional logging features, specifically making sure that I would also log the locale of the user accessing the application.
Well, three bug reports later I noticed that all errors occurred with a Turkish (tr) browser. So I changed my browser's default language to Turkish and sure enough I could see the error occur.
Or, say, this 2005 post from Scott Hanselman:
I had blogged earlier about a bug in dasBlog that affected Turkish users. When a Turkish browser reported an HTTP Accept-Language header indicating Turkish as the preferred language, no blog posts would show up. As fix, I suggested that users change their blog templates, but I knew that wasn't an appropriate fix.
I understand that Turkish prisons are not to be trifled with, but the question remains: why do Turkish people take such cruel and perverse delight in breaking our fine software? What's wrong with Turkey?
As with so many other problems in software development, the question shouldn't be what's wrong with Turkey, but rather, what the hell is wrong with software developers? Some of this is sort of obvious if you have any cultural awareness whatsoever.
These minor formatting differences are usually not a big deal for output and display purposes, but it's a whole different ballgame when you're parsing input. You'd naturally expect people to input dates and numbers in the format they're used to. If your code assumes that input will be in typical American English format, there will be… trouble.
Most languages have this covered; there are functions that allow you to read or write dates and numbers appropriately for various cultures. In .NET, for example, it's the difference between these two calls:
int.Parse("32.768");
int.Parse("32,768", System.Globalization.NumberFormatInfo.InvariantInfo);
Because no culture is specified, the first call will parse the number according to the rules of the default culture that code is running under. Let's hope it's running under a Turkish version of Windows, so it can parse the number correctly. The second call, however, explicitly specifies a culture. The "invariant" culture is every American programmer's secret dream realized: we merely close our eyes and wish away all those confusing languages and cultures and their crazy, bug-inducing date and number formatting schemes in favor of our own. A nice enough dream while it lasts, but instead of rudely asking your users to "speak American" through the invariant culture, you could politely ask them to enter data in ISO international standard format instead.
Anyway, point being, this kind of culture support is baked into most modern programming languages, so all you need to do is make sure your developers are aware of it – and more importantly, that they're thinking about situations when they might need to use it.
But all that date and time formatting stuff is easy. Or about as easy as i18n ever gets, anyway. Strings are where it really starts to get hairy. Guess where this code fails?
switch (myType.ToLower())
{
case "integer" : ;
}
If you guessed Turkey, you're wrong! Just kidding. Of course it fails in Turkey. When we convert the string "integer" to upper and lower case in the Turkish locale, we get some strange characters back:
"INTEGER".ToLower() = "ınteger" "integer".ToUpper() = "İNTEGER"
It's sort of hard to see the subtle differences here unless we ratchet up the font size:
| I → lowercase → ı |
| i → uppercase → İ |
There's obviously no way these strings are going to match "integer" or "INTEGER" respectively. This is known as the Turkish I problem, and the solution should feel awfully familiar by now:
"INTEGER".ToLower(System.Globalization.CultureInfo.InvariantCulture)
That will produce the expected output, or at least, the output that matches the comparison in the original code snippet.
This is, of course, only the tip of the iceberg when it comes to internationalization. We haven't even touched on the truly difficult locales like Hebrew and Arabic. But I do agree with Jeff Moser – if your code can pass the Turkey test, you're doing quite well. Certainly better than most.
If you care a whit about localization or internationalization, force your code to run under the Turkish locale as soon as reasonably possible. It's a strong bellwether for your code running in most – but by no means all – cultures and locales.
@Eduardo et. al
It's a matter of hierarchy, one looks for the most granular/sensical description of your nationality...I for instance am an Solarian/Earthling,Northern Hemispherian,Northern American,American,Californian,Orange Countyian,Costa Mesaian,Fairview Streetian...I should just call myself a Fairview Streetian but you'd most likely have to be a Costa Mesaian to understand that, and then an Orange Countian to understand *that*. If you walk back up the tree to American, that name would have the greatest chance of understanding regardless of your own locale. So while culturaly you may consider yourself an American, you under this easily understood naming convention would be a Chilean. It's by no means an attempt at minimizing your American'ness it's simple a matter of facilitating communications.
This whole globalization thing is a fad anyway...it's all going to implode and we'll have regional standards only again. This is the internet's biggest downfall, global voice but local context.
Ryan Smith on March 14, 2008 2:38 AMBTW...this is the OTHER Ryan Smith...sigh see naming is such a pain...
Ryan Smith on March 14, 2008 2:39 AMMost of the cases discussed in this article are ordinary internationalization issues but Turkish I problem is something different and interesting. This especially plagued java applications. Whenever String.toUpperCase() or String.toLowerCase() methods are used, code that contain "id" fails. If the string that is converted is not for the user and related to application logic, you must use these methods according to English locale. There is a detailed article about this at http://java.sys-con.com/read/46241.htm .
Also whenever a search query is made in an multi-language content (e.g. a music archive that contains both Turkish and English albums), you must assume that all the versions of #305;, i, I and #304; must be treated as the same character.
Bilgehan Maras on March 14, 2008 2:44 AMOver here we use DD/MM/YYYY, and we use commas for decimals and we usually don't group digits, but when we do, we use dots. And yes, we use the metric system. So, Turkey isn't THAT unique. But that wasn't the point of this.
BTW, in Argentina we call us people "yankis" in formally and "estadounidenses" or "norteamericanos" informally.
Alejandro on March 14, 2008 2:46 AMWhy simply Turkey? It's not like they're unique.
Most of continental Europe (and East of there) uses a period to separate number groups, and a comma for 'decimal' numbers. Also, very few places in the world write the date mm/dd/yyyy like you Americans do.
To say that Turkey is the odd-one out is to either be grossly misinformed, or largely ignorant of the rest of Europe. Not picking on you, as such, but rather questioning why this is about Turkey, not just about general internationalisation.
James on March 14, 2008 4:37 AMIt's convenient because:
1) Turkey is similar enough to other Latin alphabets that it's not a giant engineering nightmare to get it to work (see: Arabic or Hebrew).
2) The Turkish-I problem ( http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I ) causes failures in naive string comparisons, whereas other Latin alphabets don't.
3) The Turkey Test gets you 90% of the way to the goal of internationalizing most apps.* We know French and Spanish are going to work. Why not test with the most difficult (but realistically difficult) locale first?
* The other 10% is excruciatingly difficult -- again, think of Arabic (bi-directional, shaped letters) or Hebrew (right-to-left).
Jeff Atwood on March 14, 2008 4:44 AMWell, you see, the Turkish flag appears to be a C and a star, meaning their country's programming language is C*, whose last revision was 14 years ago. This is a matter of old, depreciated code!
Vaaal on March 14, 2008 4:46 AMThis post was a Turkish Delight until I read all the America bashing in the comments.
Matt on March 14, 2008 4:56 AMi like the above answering post, reasoning doesn't seem out of place in the main article though.
eryn on March 14, 2008 5:35 AM@Jeff:
they're quite unique, if they have a character where lowercase(char) != char65char90?char+97-65:char;
sorry, I just love to smite aggressive people
Aleback on March 14, 2008 5:35 AMAs a Turkish software developer, I really appreciate this post but I am sorry to see that the allegory is easy to miss. Get ready to be flamed by angry Turks.
And I wish you'd known better about Midnight Express before mentioning it even in a joke. It is simply a Hollywood exaggeration/mud.
Bahri Gensoy on March 14, 2008 5:37 AMI'll add Turkey to my test cases. Thanks from Sweden with posted 14-03-2008, 14-03-08 or 08-03-14 as some put it. Looking at the best before stamp on food wraps and it says 090308 scratching my head.
Jimmy Bergmark on March 14, 2008 5:38 AMI am sure that you read,
http://www.moserware.com/, blog post:
What Does It Take To Become A Grandmaster Developer?
Typical amarican ignorance. More languages use dd.mm.yyyy than mm/dd/yyyy, Just think about writing an hour after the minute and you know why. ;-)
With number format its the same thing. If you use roman characters, use the appr. format to.
offler on March 14, 2008 5:49 AMOMG.
In addition to java, I should learn Turkish too?
Great article, as a Turkish developer you explained the problem better than me.
Hus on March 14, 2008 5:56 AMThe only really useful date format is YYYY/MM/DD because it the only one that sorts properly as text. Go South Africa!
Adam on March 14, 2008 5:57 AMI gave a software demo in France one time only to have mass confusion five minutes unto my talk because the audience could not follow what I was doing. I had typed a decimal number, and an i18n bug would not let them do the same. To make matters worse, there was a combination of Microsoft code and custom code validating the input. The Microsoft code (correctly) disallowed periods as decimal separators on a French computer, and the custom code (incorrectly) disallowed commas. So the audience couldn't use either the French or American convention for decimals. The solution was for everyone to change the locale on their OS. They said they routinely had to do this to use American software. What an embarrassment.
John on March 14, 2008 6:00 AM"In the United States, we would typically format today's date as 3/14/2008. In Turkey, they format it as 14.3.2008.
In the United States, we use commas to group digits, like so: 32,768. In Turkey, they group digits using a period, so the same number would be entered as 32.768."
Wow, THAT's really strange! I bet they even have the metric system!
(Like every single country except for Myanmar and the US and A).
The examples you gave above are used in many countries in Europe, so that should not
and cannot be the reason.
Rationalizing mm/dd/yyyy ...
This makes sense when you are primarily talking mm/dd. As in, "March 3". The most important thing is the month. It should go first. People who say "3 March" are just plain silly and inefficient :)
The "/yyyy" at the end instead of at the start is obviously a historical error, presumably as logical as Intel putting the most significant byte in the bass-ackwards position in x86.
Tom Dibble on March 14, 2008 6:09 AMHeh I did not know about the dotted i problem.
At least one of the problems above seems to be, though, that the code was executing in the user's locale, i.e. a string the developer put in his code was being interpreted in the turkish locale. This seems to me an ASP.Net oddity, where as a convenience you set the entire thread to a specific locale and thus suddenly the entire semantics of your original program state (without any input or output) can change completely.
For me, the proper way to deal with this would be to keep the culture of the thread set to a "sane default" (invariant culture, perhaps?) and keep an eye on converting/parsing as necessary per the user culture wherever input/output takes place. I'm not sure why ASP.Net was chosen to behave the way it does, since it seems an easy source of bugs.
wpp on March 14, 2008 6:11 AMYou just opened a can of worms :)
boz on March 14, 2008 6:13 AM"3) The Turkey Test gets you 90% of the way to the goal of internationalizing most apps.* We know French and Spanish are going to work. Why not test with the most difficult (but realistically difficult) locale first?"
I fail to see how Turkish gets you 90% of the way, unless you can easily say that the character-set limitations and formatting rules for dates, etc. encompass 90% of the possibilities for internationalisation. I think it's naive (no pun intended) to assume that just because your program is compatible with Turkish formatting, it's going to be compatible in 90% of circumstances.
The approach to take is to not try to shortcut to internationalisation, but to actually *do* internationalisation properly. How would you say that Turkey will help you internationalise Norwegian, for example, where you can have circles above letters, etc.... or Greek, where there are plenty of other symbols available. Sure, there are common characters - but that's besides the point... that's more like specialisation rather than internationalisation.
I know you're talking about Latin-based character sets, but how on Earth is that true internationalisation?
James on March 14, 2008 6:13 AMAs a Hungarian developer, I also have fun with internationalization issues in software. Some aspects every developer could add to their checklist:
1. Some Hungarian accents (#337;#369;) do not fit into the ISO-8859-1 codepage used by most Americans, so in most cases you need Unicode to support Hungarian - at least we fit into the Basic Multilingual Plane opposed to some unfortunate cultures.
2. Academic collation order cannot be implemented without using some heuristics about the semantics. Opposed to the lucky Czechs, we historically decided to represent some of our phonemes with multiple graphemes (for example "sz" and "zs" is one sound). If you write "egszsg", you need to know that it is a "sz" followed by an "s", and not an "s" followed by a "zs" to put it in the right collation order. The solution was to introduce a so-called technical collation order, which is still more complicated than an ASCII-ordering, but at least it does not need the semantic analysis.
3. The thing that gives us the most fun is that we write family and given names in reverse order compared to sane countries. My family name is "Vgvlgyi", the given (first!) name is "Attila", so my full name is "Vgvlgyi Attila". Although I am used to reverse my names in foreign cultures, for a software used in Hungary, you need to translate the format string you use to glue the parts of the name together. Even gmail fails to do this. By the way, instead putting "Mr", "Mrs" and "Ms" before the full name, we put "r", "asszony" and "kisasszony" after it, but only in a very formal salutation.
4. We glue prepositions to the words they refer to, and sometimes we assimilate the word or the preposition. So if you thought you could translate "with" "hand" separately, you will not deliver your software to Hungary. "kzzel" is the right form, which is made of "kz" (hand) and "-vel" (with) assimilated. And yes, we also make plurals in a bit strange way, but leave that for some other time.
My point is, that proper internationalization is a linguistic and cultural issue, and you need to get rid of a lot of assumptions if you would like to develop cross-culturally. If you bought in to the lie that simply changing an environment variable will help you making your software "speak" a given language, you will be surprised in real life.
The Reason is the upper/lower case problem....The worst bug in the computer science world
Nick on March 14, 2008 6:16 AMOnly somewhat related, but a university I went to got a new library website, and it was AWFUL. It was impossible to find anything. You'd find the record you'd want and the button to display it would be exactly opposite of where you expected it to be.
I later met someone who was a developer, and found that they had outsourced the entire thing to Israel. Which wouldn't be a problem, other than Israeli's read from right-to-left, not left-to-right. Which made the design weirdness look a little less random -- the left-right orientation was pretty much exactly opposite of what I would have intuitively expected it to be.
Culture is a bitch.
Shmork on March 14, 2008 6:28 AMOh Man. Are you ever gonna get hammered for mentioning Turkey and Midnight Express in the same column.
That's kind of like mentioning America and...
well there's nothing bad enough to be an equivalent.
Kieran on March 14, 2008 6:35 AMOh I figured an analogy out.
It's like mentioning America and Abu Ghraib and saying Abu Ghraib is representative of American morality.
...
Actually, no its worse than that because in the case of America, Abu Ghraib is real, but Midnight Express is just a movie.
Kieran on March 14, 2008 6:37 AMHey Now Jeff,
I've seen the Turkey test since two good friends are Turkish Tulih Volkan, who love soccer pc's. I really liked this post.
Coding Horror Fan,
Catto
"Or there will be.. trouble."
You had to sneak in a Robocop reference, didn't you?
LSnK on March 14, 2008 6:42 AM"When he visited Turkey in 2004, screenwriter Oliver Stone, who won an Academy Award for his adaptation, apologized for the film, expressing regret that 'many hearts were broken in Turkey' due to the film."
Midnight Express is 'more violent, as a national hate-film than anything I can remember', 'a cultural form that narrows horizons, confirming the audience’s meanest fears and prejudices and resentments'". John Wakeman(ed) (1988). World Film Directors. New York: T.H. W. Wilson Co.
Kieran on March 14, 2008 6:43 AM1. I Agree with the first point...the characteristics identified as Turkish issues apply to a lot of European countries (I lived in Sweden and expereinced the date and period and character issues discussed, yet some of the best software and OS work comes out of Scandinavia.
2. Stating that we know things work in France and Spain is erroneous. I did work for a large US multinational and when we were deploying world wide process control systems amongst many countries: Australia, US, Canada, Mexico, France, UK, Brazil we had real issues with the French installs, primarily due to the French version of the OS (in this case Windows on the PC clients).
The US engineers had absolutely no concept that there would be localized versions of the OS (this was in the mid 90s.
3. There was mention about using ISO standards to enter information - most of the world does, it is the US that does not: Other examples, mph instead of kph, paper size (letter versus A4, the international standard)...heck, even look to NASA and it is all imperial units, yet this is (now) part of an international consortium!!
Localization makes the software presentable, however, the underlying data storage and maipulation should be in ISO format/standards and you merely have the localized 'presentation' for user input and output (this obvious includes text).
Tim on March 14, 2008 6:47 AMHas anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.
RWW on March 14, 2008 6:49 AMBahri Gensoy: "And I wish you'd known better about Midnight Express before mentioning it even in a joke. It is simply a Hollywood exaggeration/mud."
Midnight Express was based on a true story, but we knew Turkish prisons were bad before the movie.
PaulG. on March 14, 2008 6:56 AMHaaa. Jeff, you really like to stir the pot and getting your fingers burned! I do take issue with your characterization of Midnight Express as a credible source for your opinion--and many commenters here pointed out why (it's basically wrong as a historical document) -- but I laugh about it because I sense that you are being facetious there. No one in a right state of mind understand anything about Turkey from that one movie. You are right about using Turkish as a test case for internationalization, and it's good to see my country's flag here.
picardo on March 14, 2008 6:58 AMThe reason to use YYYYMMDD is not because it helps with a "standard numerical sort", but because it puts the Most Significant portion first, just like in common number printing (there may be differences between using comma and period as a decimal separator, but the places extend to the left in increasing significance. No one prints the hundreds place to the right of the ones place, do they?
Andy on March 14, 2008 7:06 AMPaulG, you are unbelievably ignorant.
http://en.wikipedia.org/wiki/Midnight_Express_(film)#Billy_Hayes_interviewed
boz on March 14, 2008 7:07 AMNote when in non Turkish locales you want to ignore the differences
between #304; I when changing case. These characters are normalised to
ASCII on linux at least by doing upper() then lower().
When in the turkish locale the characters are not merged as expected.
I've used the following function on linux to transform text before comparison:
void transform(wchar_t* wcs)
{
if (ignorecase) {
/* Note this handles Turkic Case folding */
(void) wcsupper(wcs);
(void) wcslower(wcs);
}
/* Other possible transformations one could do here are:
StripDiacritics: - A
ConvertEnclosed: #9398; - A
ConvertStylistic: #65313;- A
TurkicFoldCase: #304; - i
Note the above are transformations done in msort.
Note python can normalise some things also:
unicodedata.digit(u'\u2462') # #9314; - 3
*/
}
Nothing wrong with Turkey..Turkish language is different than English, that's all..
Question is; What's wrong with you?
What does Turkish prisons or Turkey as a country have to do with this problem? Who is breaking your fine software, Turkish people or the fact that your software is not localized that fine?
Cagri on March 14, 2008 7:09 AM"Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious."
So that Americans can understand the date? ;) It is just tradition at this point, I think.
Neil on March 14, 2008 7:11 AMHas anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious
Celebrating pi day don't make much sense when your date format is dd/mm.
Go 3/14! ;-))
Happy pi day!
Hartmut on March 14, 2008 7:12 AM"Has anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious."
Convention is the reason, I'm not sure it is a good reason, but it does tend to dominate so many silly "unconventional" aspects of US standards. (I'm a US citizen if that matters.)
malachi on March 14, 2008 7:12 AMHas anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.
The best I've heard is because it's the way you say it (look at the dates on these replies). Although that's erroneous since I hear just as many people say "the 5th of March" instead of "March [the] 5th."
The US way is kind of nonsensical. The only two ways that make any sense are dd/mm/yyyy and yyyy/mm/dd. The latter is preferable for searches (as already mentioned)
Poita_ on March 14, 2008 7:12 AMI learned quite a bit from folks who left comments on my "Turkey Test" post. One of most interesting was that in Germany, Excel CSV files use semicolons. That is, you can't look for "," but rather CultureInfo.TextInfo.ListSeparator.
"Question is; What's wrong with you?
What does Turkish prisons or Turkey as a country have to do with this problem? Who is breaking your fine software, Turkish people or the fact that your software is not localized that fine?
Cagri on March 14, 2008 06:09 AM"
Cagri, no need to be defensive, your point is what Jeff is also trying to explain here.
Yas on March 14, 2008 7:18 AMOne of most interesting was that in Germany, Excel CSV files use semicolons.
because the decimal separator is the comma. i.e. 1.25 - 1,25.
Hartmut on March 14, 2008 7:19 AMIf you run your .NET projects through FxCop with the standard rules turned on, it will alert you to every string parsing or writing routine where you haven't considered the locale. Easy way to help being "Turkey compliant".....
Ritchie Swann on March 14, 2008 7:21 AMTurkish "i" character is a common problem It does not even support CCS capitalize command too.
And there are many softwares which stuck in Turkish locale which is mostly about lazy-coding.
About the Midnight Express, I live in USA, have gone to Turkey few times (by the way they are great people in general) and the movie is a bullshit similar to USA bringing democracy to Iraq.
And to remind, USA still tortures people officially from Iraq war in Guatemala.
Brian Kelian on March 14, 2008 7:21 AMThere are a whole host of BUSINESS ISSUES that aren't readily apparent when it comes to internationalization. For example, postal codes in the UK can contain alpha characters as well as digits. Many people have more than three names so the typical American first, middle, last doesn't cover all of the possibilities. And I could go on and on.
It takes a lot more than choosing invariant cultures, formatting dates, and formatting decimals correctly for a program to be truly "internationalized". You also have to make sure that the business rules are adjusted as well.
Matt on March 14, 2008 7:39 AMGood Lord, can't we move past the referece to Midnight Express and just discuss the i18n problem. That reference was one small piece of an important topic, but instead of reading comments on how other people handle it, I am reading numerous posts about the US still torturing people, ABU Ghraib, and basically how Americans are lazy coders with no regard for other cultures...
If I wanted to read those types of comments, I'd go to CNN or MSNBC and read their blogs...
Let's stay focused and on topic.
Wayne on March 14, 2008 7:47 AMConvention... to keep Americans happy... Pi day...
Thanks for all your answers, but I'm still looking for a _good_ reason...
It used to be the "convention" to send children to work down mines, or to make women stay at home raising a family. To me, tradition and convention don't seem like _good_ reasons to keep doing something utterly ridiculous - just reasons that don't require further thought. Oh well... perhaps I'm in the minority on this one...
As for Pi day - the only people who care about that are mathematicians and geeks. I would imagine that most mathematicians and geeks are familiar with modulo arithmetic, so why not celebrate it on the 3rd of January?
(For what it's worth, I'm a UK citizen who uses dd/mm/yyyy to comply with the rest of the country, but would rather the world standardised itself and started using yyyy/mm/dd. I also believe that we don't need any numerical seperator besides the decimal point - what is so confusing about 2000000 compared to 2,000,000 anyway?)
RWW on March 14, 2008 8:02 AM"In the United States, we use commas to group digits, like so: 32,768. In Turkey, they group digits using a period, so the same number would be entered as 32.768."
Uh, wtf? In Spanish-speaking countries (surely more users than Turkey alone!), we also have comma and period swapped compared to English. Comma for decimals, period for thousands grouping.
Nicolas on March 14, 2008 8:13 AMAs echoed by millions here, dd.mm.yy is followed a lot in other countries. When it comes to grouping digits, remember that comma(,) is used as decimal in some european countries. There are some stuff only US follow but not many acknowledge there are others who don't follow those standards..
Besides; most of us play cricket not baseball :P
Leafy on March 14, 2008 8:17 AM@RWW
The difference in 2000000 and 2,000,000 is readability. Is the first two million, two hundred thousand, or twenty million? Hard to tell at a quick glance, but the logical grouping into common units (tens, hundreds, thousands, etc...) makes it easier to read.
As for the mm/dd/yyyy date format, convention and common use is what brings that in. As an American, I have been taught since pre-school to write dates as March 13, 2008 on everything. So that carries over into work life and programming. Old habits are hard to break. The metric system is how old, yet we still use what we call the Standard system.
It's hard to justify to the entire country that a few programmers think date format should be changed to make it easier to write code.
It all boils down to the Secret society of Stonecutters. "Who keeps the metric system down? We Do!"
Wayne on March 14, 2008 8:26 AMRelax everyone! When the NWO takes over there will be only one language, you will have a unique number (so don't worry about how many middle names you have), and "Globalization" will have removed the need for any of this.
This brief period in time, made possible in part by the US Constitution, where freedom to do and innovate ran rampant, will finally become under control of the world leaders, and will be found in Wikipedia as an interesting footnote in history. No doubt Googled on by your grandkids with great curiosity in their State-run cooperatives.
THX-1138.
And to remind, USA still tortures people officially from Iraq war in Guatemala.
Brian Kelian on March 14, 2008 06:21 AM
I think you meant Guantanamo (a.k.a. gitmo) unless im mistaken.
Arron on March 14, 2008 8:41 AMThis sounds arrogant, but I am not meaning for it to sound that way...
One last thing... For the past 40-50 years, where was the global epicenter of business? The United States. Where were many of the companies founded that create the computers and operating systems we still use? The United States. Apple, HP, IBM, Microsoft, SUN... these companies originated in the US, so naturally they will adopt the US standards. It's only been in the last 20 years that globalization has been a big buzz word. Many of these things originated in a closed system.
Wayne on March 14, 2008 8:44 AMHey funny you should mention date issues, just going through a problem with SQL and date/time.
Come to mention it what time zone am I posting in (see bottom of post)shouldn't it be GMT.
Coding horror fan!
Tom on March 14, 2008 8:46 AMIn OS X you could use CFStringCompareWithOptionsAndLocale, which can ignore character differences such as the Turkish I.
Mike on March 14, 2008 8:55 AMIn Britain the 'standard' system is known as imperial.
Yet it is not standard and the rest of the world thinks of america as imperialistic. How ironic
Tom on March 14, 2008 8:58 AM
Haramut wrote:
One of most interesting was that in Germany, Excel CSV files use semicolons.
because the decimal separator is the comma. i.e. 1.25 - 1,25.
Actually, this is irrelevant, the CSV format provides a method to escape commas in data, the above would be "1,25" with double quotes. And double-double quotes escape allow a single double quote.
This is a Microsoft braindeadness we are seeing, where "Comma Separated Value" failes are not always separated by commas. Also, it is contrary behavior to RFC 4180 for CSV (which admittedly hasn't existed for as long as Excel's handling of CSV files).
Anyway, your premise that it is because of commas as a decimal separator is demonstratably false.
Wayne wrote:
The difference in 2000000 and 2,000,000 is readability. Is the first two million, two hundred thousand, or twenty million? Hard to tell at a quick glance, but the logical grouping into common units (tens, hundreds, thousands, etc...) makes it easier to read.
Not to mention in India, where readbility grouping for the number 2,000,000 would be written 20,00,000
JohnH on March 14, 2008 9:19 AMWhat's the matter Jeff??? I have spent the las 6 months mastering the 'Works On My Computer' paradigm so I can get certified and now you change everything I believe in? jeez....... Now my boss will demand my programs to work on the company's server......
Yorch on March 14, 2008 9:24 AMYou can please all of the people some of the time and some of the people all the time, etc., etc. Work to make your applications accessible for your targeted audience, but know that in a world of such diversity you'll never get it perfect.
And for anyone who has sensitivity issues, please be reminded this is a blog in English, originating from a U.S. Citizen, living in the U.S. In-jokes, media references, etc. will probably be targeted primarily to a U.S. audience. So, accept that.
Additionally, stop being so freaking sensitive. Political correctness is starting to piss me off.
kenneth on March 14, 2008 9:26 AMDo you know that America includes Mexico, Peru, Chile, Argentina and more than 20 other countries?, don't you?
BTW, only in USA a Billion is 10^9, in the rest of the world a billion is 10^12.
Eduardo Diaz on March 14, 2008 9:53 AMAmazing coincidence I had this problem at work last week with a database system with asp.net web front end that we didn't develop. One of the web servers for some reason wasn't defaulting to British English it was one of the eastern european ones, this caused a major problem and took me quite a while to work out. The reason it was such a problem was the inversion of commas and decimals for numbers, a calculation was being performed obviously badly and coming out with stuff like -12 million.
On the case of the illogical date format in the US why mm/dd/yy it has no order of precedence in britain it's dd/mm/yy which is atleast logical smallest interval to largest.
Although I have to say the best from a programming perspective would be YYYY/MM/DD HH:MM:SS.mmm where the number order as you see it YYYYMMDDHHMMSSMMM is also the ordering of the dates.
Pete on March 14, 2008 9:54 AMNot for nothing.. but a forum I regularly attend is constantly under attack by crackers from Turkey.
Patrick on March 14, 2008 9:54 AMAlso on dates, the main separators that are almost universal and you should expect are the slash(/), dash(-) and period(.) they are used and switched between by various people in the same country. IIRC windows even uses all of them for the US regional settings.
As others have said in most of Europe and Asian a period is used for digit grouping and a comma is used for decimal designation.
As for bad countries for handling sorting wait until you have to deal with surnames of Scotland. You have Mc, Mac,Mak,MC,Mapostrophe and Mwierd circle character that are pronounced the same but have a specific sort order that is not your standard case insensitive latin order
will dieterich on March 14, 2008 10:06 AMNicolas: "Uh, wtf? In Spanish-speaking countries (surely more users than Turkey alone!), we also have comma and period swapped compared to English. Comma for decimals, period for thousands grouping."
WTF? Where did Jeff say "Only in Turkey". He was doing a specific comparison between two countries.
Don't get your shorts in a bunch. Learn to read. (And that goes for all of the other posters who had to chime in with "You don't know what you're talking about, Jeff. My country whereever does it too!!!" He never said it didn't.
That's like someone saying, "Look at this! That grapefruit is bigger than this orange!" And somebody else saying, "You don't know what you're talking about. This orange grown in my country is bigger than the grapefruit." So what? The comparison was between the original single grapefruit and the single original orange, not all oranges and all grapefruits ever grown.
Use your brains, people.
KenW on March 14, 2008 10:09 AM"The other 10% is excruciatingly difficult -- again, think of Arabic (bi-directional, shaped letters) or Hebrew (right-to-left)."
Arabic is not bi-directional. It's right to left only like Hebrew.
Abdu on March 14, 2008 10:11 AMWhen the NWO takes over there will be only one language, you will have a unique number (so don't worry about how many middle names you have), and "Globalization" will have removed the need for any of this.
Good! At last we'll all be able to agree on something! (Note: I already have a unique number, my NI number. I believe in the US you have a similar system of social security numbers or something like that.)
This brief period in time, made possible in part by the US Constitution, where freedom to do and innovate ran rampant,
Unless of course you exist outside of the US constitution, in which case be prepared for (armed) troops to arrive to introduce/enforce their version of democracy. Good old Uncle Sam.
RWW on March 14, 2008 10:18 AMSomeone should force USA to use the metric system, ddmmyyyy and ',' as a decimal separator.
Eikern on March 14, 2008 10:25 AM@Eduardo...
I am assuming your "You do know that America includes..." was directed at my use of the term American to describe the people in the United States of America. And, yes I realize that, but common use equates the 2. Canada is in North America, but they are referred to as Canadians. Mexico is in North America, but they are referred to commonly as Mexicans. This same thing applies for all other countries in both North and South America... But the people of the USA are almost always referred to simply as Americans. Why? Probable because "United Statians" or USAians sounds pretty silly...
And with the current state of things, I doubt many outside of the USA would like to be referred to as Americans... seems to me that many people in my own country don't even want that moniker...
When I traveled abroad as a teen, I was told to tell people I was from Canada - Very hard for a Texan to do.
OK this is a stupid comment but someone has to make it:
mmmmmmm, tuuurrrkey....
Steve on March 14, 2008 10:27 AMI guess it is mostly no longer a problem, but Turkish also uses lower-case y-umlaut which has code point 0xFF, and some code (badly written C code, mainly) would fail to handle that, interpreting it as EOF (-1).
Anaconda on March 14, 2008 10:28 AMWell hello,
I am subscribed to Coding Horror for some time and I was very surprised to see a Turkish flag in my reader. I am also a software developer but I work in a different branch (VxWorks) and sometimes .Net.
I think our european friends put it right, we use the same localization system with the rest of them (except UK of course). Today is 14.03.2008 and price of gasoline is 3,25 TL (hard to imagine right, we live in the middle east!) and my car mileage is at 14.000 km. Of course you can test your code by changing your windows localization setting to Turkish TR to see what happens, but any immediate problem can be solved by the help of .net globalization libraries as Jeff mentioned.(I remember that there was a Turkish-TR example there too.)
For the letters #305; - i - I - #304;, well every nation has to optimize their alphabeth for their language. Until 1928, we were using Arabic alphabet, I am now glad to have the headache of irregular letters mentioned in the post. I believe "Turkey Test" is a great idea to do.(Started using it when Turks become muslim in around 8th century).
For the movie "Midnight Express", I think every Turk should remember that it is a "movie" not a documetary. I think sometimes we overreact. Actually I find Midnight Express quite entertaining: The man get caught with dope, they put him away and he was raped in jail. (But according to the victim himself he wasn't raped, he was a homosexual! thank you MR.Stone!)If I can watch Prison Break or Shawshank Redemption for fun, I can also watch Midnight Express.
Kerem on March 14, 2008 10:31 AMIn "Midnight Express" the guy's girlfriend paid him a visit and opened her blouse. Dreamt about her for a week.
wackadoo on March 14, 2008 11:13 AMAbout the US date format, it's all a matter of ordering to place emphasis on what we think is important. When we look at a calendar, we need to know the month first, then the day... but we just assume the calendar is for the current year.
In my experience, in the US, phrases like "the 3rd of March" aren't often used. When they are, it's usually in something formal. In common practice, we refer to it as March 3rd or March 3, 2008.
I personally tend to write dates as 2008-03-14, but that confuses some people, too.
Powerlord on March 14, 2008 11:14 AM"Arabic is not bi-directional. It's right to left only like Hebrew."
It's my understanding that numbers are left-to-right in Arabic, whereas the rest of the text is right-to-left, so it is in fact bi-directional. This makes for very interesting text-selection behaviour...
Martin Cooper on March 14, 2008 11:22 AMI guess I'm atypical of Americans (US Citizens, sorry)
Today is 15MAR2008. Unless, of course, I need to do a string sort on it, then its 2008MAR15.
I write 2,000,000 as 2,000,000, 2.000.000 and 2 000 000, depending on my mood.
Binarycow on March 14, 2008 11:26 AMJames,
To say that Turkey is the odd-one out is to
either be grossly misinformed, or largely
ignorant of the rest of Europe.
I hope you're not implying that Turkey is a European country, and that Turks are Europeans.
Chris on March 14, 2008 11:27 AMIn USA date are written mm/dd/yyy because they way they speak. While one American stated that is not his case, does not qualifies his opinion as what is used the most, just his own. Depending on which state us USA you go, they all speak in different ways (just like in any other country).
The way this makes sense can be understood by observing other expressions like "he is a city person" instead "a person of the city". Both sentences are similar but they can have different meanings. "a city person" is someone who grew up in the city, in contrast to "a person of the city" who could be a country person currently living in a city.
While the meaning in the dates does not change using either form, it is the form of speech that mandates it's written form.
As an exercise let me compare this sentences to Spanish:
"racing car" Vs. "auto de carreras" (car of racing)
"electric pump" Vs. "bomba electrica"
"May 2nd" Vs "2 de Mayo"
Americans keep telling me "You people speak backwards!"
Then I came to learn that a million and North America have a different meaning in this country and many more (http://en.wikipedia.org/wiki/Continents).
USA has done efforts to change. Every commercial product in USA state the contents on Imperial AND Metric numeric systems. It is just hard to change old habits. Not to forget the economical impact in the industry that such changes represent.
The point of the article is simple: You go global, then be aware that each country is different. Plan ahead!
Ricardo on March 14, 2008 11:34 AMChris,
I hope you're not implying that Turkey is a European country, and that Turks are Europeams.
No, sorry... the inference was in relation to my earlier post where I thought I'd asserted the fact that I was British - but with hindsight, I hadn't :)
Though... with the EU, that's all set to change, anyway...
James on March 14, 2008 11:38 AMYou should link the Turkey Test Passed image back to Moser and watermark it or something to make it clear where it came from.
Also, with all due respect, I didn't really dig the title...seemed culturally insensitive as did the Midnight Express reference.
Scott Hanselman on March 14, 2008 11:45 AM"About the US date format, it's all a matter of ordering to place emphasis on what we think is important. When we look at a calendar, we need to know the month first, then the day... but we just assume the calendar is for the current year."
Who's this "we" that you mention?... I'm not one of them! Kindly exclude me from any similar statements in the future ;)
RWW on March 14, 2008 12:02 PMHas anyone ever heard a good reason for using the mm/dd/yyyy date format? Just curious.
Because in English, one says May 20, 1998 -- that is, the spoken word convention puts the month first, then the day, then the year. So month day, year naturally becomes month/day/year.
Why English ended up *saying* the day before the month I assume results from it sounding nicer or some such.
austinjp on March 14, 2008 12:08 PMI got bit in the ass by this last week.
I was writing some calculations for a website and for the life of me I couldn't figure out why it was returning odd errors.
About two hours of debugging later, I realized that my culture code was set to traditional Chinese which uses the . as a , in number formatting.
At least I learned a valuable lesson there.
Ryan Smith on March 14, 2008 12:23 PMScott H. -- cut him some slack, it's the first day of his new career.
As for sensitivity -- people could write all day about San Quentin or movies made about it and I could care less. Chill out.
I'm just pissed because he's got me thinking about all that hashish...
wackadoo on March 14, 2008 12:33 PMBecause in English, one says May 20, 1998 -- that is, the spoken word convention puts the month first, then the day, then the year. So , naturally becomes //.
Rubbish. I'm English and I've never said "May 20, 1998", but always "20th of May 1998". In your mutilated version of our language, you may say it that way - but that's it.
James on March 14, 2008 12:37 PMSomeone said "... opened a can of worms" and another said "...seemed culturally insensitive..." Isn't this a standard blog technique? Make potentially inflaming statements and you will draw people in. Just yesterday there was an article somewhere that indicated that people are very attracted to blog _comments_ simply because people can be misinformed, misunderstood, misunderstanding, and write out of passion than out of consideration.
I don't believe Jeff anywhere said a negative thing about Turkey specifically. The movie reference is just a _movie_, this is just a _blog_. But it is easy for someone to take as a slight once one culture is compared to another.
How many British or British English programmers hate that they have to spell colour as "color" for 'merikan programming languages?
This is not specific to Turkey. Me, as a Romanian developer, I can confirm that all these cultural differences are present in our culture too.
US is simply, like Great Britain, an island of non-standards such as using a thousand of measure units for length and so on. We (Europe) use one.
Andrei Rinea on March 14, 2008 12:46 PMWhat is it with country names and food?
Turkey? Chili? Hungry?
@Wayne, as chilean, I was taught that also I am american, too.
Also in Peru, peruvian consider themelves american, so people in Argentina, etc.
In general we refer to us as "americanos", we say "soy americano" (I'm american) or "somos americanos" (we are american).
I suggest to read: a href="http://www.dcc.uchile.cl/~rbaeza/inf/american.html"http://www.dcc.uchile.cl/~rbaeza/inf/american.html/a
This is a cultural issue, off course, so you must consider it when doing internationalization.
Also is a political issue, some people see the use of the american word as a sign of arrogancy (and sometimes imperialism) of the US citizens...
"...And with the current state of things, I doubt many outside of the USA would like to be referred to as Americans... seems to me that many people in my own country don't even want that moniker"
Well, for me "american" is not an ugly moniker, because when I read american I dont think on a US-citizen. You are still thinking in american as us-citizen only, think globally!
In fact, people in the rest of america eventually could call you "gringo" or "yanqui", rather than american (think about it).
I like this blog, and don't want to start a flame war (or a political war), just to stablish that if you want to talk about internationalization you must think in political and cultural issues too.
Have a nice weekend.
Eduardo Diaz on March 14, 2008 1:03 PMAs a Scandinavian, the thing that bothers me most is web sites that ask for input in UTF-8, but clearly can't handle it properly.
My local time is 20:00
My local date is 14.03.08
And I write ten million as 10 000 000,00 or 1.000.000,00
N on March 14, 2008 1:05 PMSorry, I made a mistake in my previous post
I write ten million as
10 000 000,00 or 10.000.000,00 (Local time: 20:07)
N on March 14, 2008 1:07 PMAs an ancient Roman, I am highly offended by your callous reference to "movies about gladiators".
Anonymous Cowherdius on March 14, 2008 1:21 PM"Thanks for all your answers, but I'm still looking for a _good_ reason..."
Because there's no reasonably conversion path from mm/dd/yyyy to dd/mm/yyyy. Given a piece of data, you can't tell whether it is pre-conversion or post-conversion. You would need to convert from mm/dd/yyyy to something like yyyy/mm/dd. And get the Europeans to convert too.
(Of course, the real reason is because the gains from switching do not outweigh the costs.)
Scott on March 14, 2008 1:42 PMThere also is the String.ToLowerInvariant() Method.
From MSDN:
If your application depends on the case of a string changing in a predictable way that is unaffected by the current culture, use the ToLowerInvariant method. The ToLowerInvariant method is equivalent to ToLower(CultureInfo.InvariantCulture).
The reason to use YYYYMMDD is not because it helps with a "standard numerical sort", but because it puts the Most Significant portion first, just like in common number printing (there may be differences between using comma and period as a decimal separator, but the places extend to the left in increasing significance. No one prints the hundreds place to the right of the ones place, do they?
Andy on March 14, 2008 06:06 AM
Oooh, swing and a miss. In normal numbering, the larger digits are more significant, because you don't normally care about the smallers.
If you were doing a post count, it wouldn't be vital that there were 106 rather than 100, for example.
In dates, it's the other way roung. You rarely need to know the year, because by that time the data's obsolete, but you constantly need to know the day or month.
Imagine if the only detail you had was that these were all posted in 2008. That's not valuable information.
C'mon computer people, we can do better than that. Let's just use a system like IP with the most significant part on the right.
seconds.minutes.hours(24hr).day.month.year
BTW: it is 0.0.22.12.3.2008, do you know where your kids are?
Of course we could always use the Integer based seconds since 1/1/1980 system, I am sure we could convince the technophobes to switch.
The comments to this entry are closed.
|
|
Traffic Stats |