Software Internationalization, SIMS Style

March 9, 2007

Internationalization of software is incredibly challenging. Consider this Wikipedia sandbox page in Arabic, which is a right-to-left (RTL) language:

Wikipedia sandbox in Arabic

Compare that layout with the Wikipedia page on internationalization and localization in English. Now consider how you'd implement switching between English and Arabic in MediaWiki, the software that powers Wikipedia:

  • Every bit of static text on the page has to come out of a unicode string resource file, indexed per-culture.
  • Images that happen to contain text, or are otherwise culture-specific, must also be placed in a resource file and indexed per-culture.
  • Numbers, currency, and dates must be displayed (and validated) differently depending on what country your audience lives in.
  • You could detect the country your users are in, and automatically assume which language they're using. But this is obviously problematic in countries where multiple languages are spoken. Or, you can allow users to manually choose a language the first time they access your application. This is slightly easier in web applications, because you can absorb the ambient language setting from the browser's HTTP headers.

It's a lot of work.

Beyond the purely mechanical grunt work of translation, there are deeper cultural issues to consider, such as avoiding offensive images, colors, or concepts for certain cultures – and how the concepts you're trying to express in the software will map to a given culture. As noted in a related Larry Osterman post, these deeper cultural considerations are collectively known as localization:

[localization] is a step past translation, taking the certain communication code associated with a certain culture. There are so many aspects you have to think about such as their moral values, working styles, social structures, etc... in order to get desired (or non-desired) outputs. This is one of the big reasons that automated translation tools leave so much to be desired - humans know about the cultural issues involved in a language, computers don't.

The Sims has a unique solution that sidesteps the software internationalization problem. They invented an entirely new, completely artificial language: Simlish. Simlish renders your cultural background irrelevant. When you redefine language as gibberish, it's equally meaningless to everyone. Or is it? Somehow, The Sims is playable without a lick of translation or localization, without any comprehensible language of any sort.

Signs in The Sims games often do not contain text; they consist entirely of graphics. For instance, the stop sign in The Sims is a red octagon with a flat, white hand. In The Sims 2 it becomes a white bar instead. The sign for a grocery store depicts a cornucopia, and that of a restaurant shows a hamburger or a place setting.

In The Sims, most text is only distinguishable at very close zooms. On book covers, newspapers and Nightlife's "Sims Must Wash Hands" sign, the lettering is all nonsense characters that bear about as much resemblance to Latin characters as they do to Cyrillic. Almost no actual characters from any known alphabet are used. The game uses the Simoleon sign (closely resembling ) as the currency symbol.

The Simlish alphabet

When Sims are writing novels or term papers, dingbats from the Wingdings font appear as text on the screen. The notebooks used for homework contain writing composed of random lines.

Characters in The SIMS don't just write in Simlish – they speak it, too:

When The Sims was originally designed, Will Wright wanted the language the Sims spoke to be unrecognizable but full of emotion. That way, every player could construct their own story without being confined to a Maxis-written script (to say nothing of the mind-numbing repetition). We experimented with fractured Ukrainian, and the Tagalog language of The Philippines. Will even suggested that perhaps we base the sound on Navajo, inspired by the code talkers of WWII. None of those languages allowed us the sound we were looking for – so we opted for complete improvisation.

Simlish is, by definition, meaningless. And yet it's surprisingly easy to figure out what a Sim is talking about, even without any visual point of reference or a facial expression to read. The intonation and context of the sounds is enough to extract meaning. Try these two Simlish MP3 samples (one, two) and hear for yourself.

Simlish even extends to music. Last year, Maxis paid many original artists to re-record their songs with Simlish lyrics:

Each artist rerecorded one of their songs with new vocal tracks, replacing English lyrics with nonsensical Sim-speak. Simlish words don't have any real meaning, so the artists were free to come up with whatever sounded good, as long as English didn't seep in. The result isn't that different from what bands like the Cocteau Twins and Vas already do. The idea is to transcend words and use the human voice to express pure emotion.

Charlotte Martin, whose song "Beautiful Life" finds its way onto the University soundtrack, took things a step further than some of the other artists. She didn't just sing gobbledygook, she made sure all the Simlish words were consistent with their counterparts in the English version. "It still had the same meaning, I just had to write it in an alien language," Martin said. In rewriting the song, Martin said it changed the way she thinks about lyrics, letting her come at her creation from a more technical standpoint, paying closer attention to syllables and rhythm.

Probably the funniest example of this is the Pussycat Dolls' re-recording of "Don't Cha" in Simlish.

Pussycat Dolls 'Don't Cha' in Simlish

Listen to "Don't Cha" in Simlish (mp3). Singing in gibberish almost makes a Pussycat Dolls song more intelligible. It's brilliant. Doba, baby, doba!

Another example is Lily Allen's "Smile". Compare the original version of "Smile" with the Simlish re-recording of "Smile". It works well for that cheeky little song, but it's a little weirder when a morose band like Depeche Mode re-records a song in Simlish.

When you hear Simlish, you expect to hear meaningless gibberish. But instead, you hear something else, something unexpected. The absence of language isn't limiting; it's liberating. You move beyond language, from expressing with words to expressing visually, aurally, emotionally:

For songstress Abra Moore, whose song "Big Sky" was used in the game, singing in Simlish gave her a new perspective on her music. "It's like jazz for me; I just take to it like a duck to water," Moore said. "It was very liberating creatively." The experience made such an impression on Moore that she said she'd consider recording a song in Sim-like scat on a future album. She perceives the emotional lyrics, divorced of a specific meaning, in almost a spiritual light. She's fascinated that fans try to interpret the nonsensical lyrics. It represents the essence of human nature, Moore said, to take meaning from something that has no meaning.

Spoken words and music are dense with multiple levels of audible meaning. We probably can't take such Simlish liberties with applications and web sites, which are anchored on the flat, one-dimensional medium of text. The challenges of i18n and l10n are unavoidable for us. But as the Sims shows us, there's a lot to be said for following human conventions which work across all languages and cultures.

Posted by Jeff Atwood
40 Comments

You should listen to Sigur Ros's album from 2002, if you haven't heard it. It doesn't really have a title (it's known by "()" because of the art on its cover), and the album artwork, cd, booklet, etc, feature no words. All of the "lyrics" in all of the songs are complete jibberish. The whole thing lacks language.

I haven't heard any of these Simlish recordings, so I don't know how they work, but when listening to (), your brain seems to want to recognize words very badly, and so you kind of map the sounds into words roughly and they come out weird. It makes you feel kind of like your internal speech-to-text unit crashed.

c on March 10, 2007 9:05 AM

Reminds me of the loituma girl. ( http://dojo.fi/~rancid/loituma__.swf )

Mike Akers on March 10, 2007 9:54 AM

Wow! What a great post. Just so long as I don't have to know Simlish to get past the capcha rapcha.

orcmid on March 10, 2007 11:02 AM

Where/How did you get permission to host these songs?

James on March 10, 2007 11:59 AM

Hi! Very interesting subject.

I think you're mostly wrong about Simlish and human conventions across languages. Simlish works well for westerners because it's based on English structure and experssion.

There are many cultures which sound and behave completely different than English. For their members the tone in which Simlish is spoken and even the physical gestures by the Sims are unintelligible. They just don't make sense.

urig on March 11, 2007 1:20 AM

Of course, that red octagon with a white hand is still going to be a problem for people in nations that whine about the colour red or open palms. And anyone who take their cow worship seriously (to the degree of letting the poor things starve to death because, well, feeding them can be expensive) are liable to whine about the hamburgers.

So, anyone remember what happened to Esperanto?

How did that "new language that isn't any existing language but has similarities to many other languages" work out?

Oh, yeah. It turns out that hardly anyone was actually interested.

And tooltips were invented because it turns out that icons are really amazingly hard to use if you want to convey anything that isn't already known by every 2 yr old on the planet. (i.e. "stop" signs are easy because everyone knows about them. "antidisestabilsmentariainism" is a really hard thing to convey in an icon, because very few people know what it is.)

Bob on March 11, 2007 1:24 AM

Simlish is just another foreign language (one that can't ever be learned), and I got the exact same impression from the songs as I used to get from french, japanese, russian, indian, hebrew (etc) songs. Especially french, and I suspect this is because french and english are so similar in construction and intonation - indians or chinese might get the basics of the emotion, but without the same linguistic pattern nuances would be lost. Try to figure out what emotions hebrew is conveying, for instance - it isn't quite what your gut would first tell you.

Honestly, the emotional content of music - which is primarily used for tranfer of visceral emotion, not storytelling, except in folk music - doesn't seem to have anything to do with translation and localization. There's just nothing to localize if you're not conveying any deeper meaning, and for that reason people can play some import games no problem. The speech is a total red herring, the side channel information is what's being used to process it all.

Likewise, a red octagon is a stop sign in most countries whether it has english or not (most of the world does use english stop signs...), so there's nothing to localize. Swedish signs are even more iconic, but some are incomprehensible without foreknowledge. In general, the history of traffic signs has evolved into reasonably descriptive icons that can be learned and then applied anywhere - if your point was how well that applies to software, well, it was quite hard to pick out of the post. I totally believe that fascilitating fast understanding with pictures and sound is very important, which is why I love programs with icons in menus, and websites that try to balance gobs of text with mystery meat - standardization is the key.

If you're just writing about cool stuff in a stream-of-consciousness format, well, cool. =p Sorry, this was a little long for a comment.

Side note, awesome sign vandalization. http://en.wikipedia.org/wiki/Image:IMG_1176w.JPG

Foxyshadis on March 11, 2007 1:33 AM

(Typically, someone said it far more succinctly than I could. .)

Foxyshadis on March 11, 2007 1:44 AM

It is an interesting read. I would consider the idea of Simlish as a very novel solution to a complicated problem though.

I think that what's often forgotten beyond language, cultural symbols, social structures, or other surface elements like colours, is that culture not only affects our perceptions, but also thinking processes and manners of interacting. It's ridiculous for us to assume that even if we were to 'solve' the problem of localization, all cultures would value and use the same 'tools' (like websites such as Wikipedia) as those of us do in western societies.

I think the problem of localization will be addressed only when all societies are equally empowered to develop their own tools, or at least be informed enough to be able to determine what they locally need.

cbentl2 on March 11, 2007 1:47 AM

i think you mean "Antidisestablishmentarianism" sorry to be a pedant

albear on March 11, 2007 1:56 AM

Having watched those music videos in Simish, I cannot help but remember a webcomic from ten years ago:

http://plif.andkon.com/archive/wc072.gif

It's a bit too early for the idea of completely giving up on verbal and written communication, even though both fail me occasionally. :)

P.S. Funny, but the excerpt from "Levan Polkka" that is used in loituma.swf, while sounding vaguely finnish, actually is the only gibberish verse in the song, meant to imitate an instrumental solo part, which isn't otherwise possible a-capella. :)

Mihara on March 11, 2007 3:11 AM

I enjoyed reading this article very much!

Nice, man!

fish on March 11, 2007 6:38 AM

Jeff:

What an interesting post nice stuff to think about, you are my new idol.

Javier on March 11, 2007 6:47 AM

It took me several years to learn "properly" english... and to tell the truh, I have been deeply enjoying the gibberish on english sang tracks... And I continue doing, most of the times human voice is just another wonderful instrument. Being able to grasp what are they singing about is the cream on top or the key point to like rubbish music with fantastic lyrics...

argatxa on March 11, 2007 7:10 AM

Right. We should avoid I18n altogether by using Ido. English is a mess.

wageslave on March 11, 2007 7:30 AM

Simlish works well for westerners because it's based on English structure and experssion.

Urig, I have no doubt The Sims is biased towards western culture (or at least romance languages), but can you provide specific examples of where you think the Sims breaks down for non-westerners?

And yes, the the hamburger sign is probably not a great choice for India.

Jeff Atwood on March 11, 2007 9:27 AM

I don't thing the reataurants with the hamburger signs are a good thing in India... They should have chosen a more neutral sign for a really international sign... A hard thing to do... An apple, perhaps?

LC on March 11, 2007 10:19 AM

I kept hearing that Lily Allen song as though it were in French! I wonder if that's just me or whether it actually sounds like French?

The whole thing reminds me a little of what I felt like when I first heard about Racter (http://www.ubu.com/concept/racter.html). Of course it became apparent that it was a hoax, however the idea is still fascinating. Just like meaning without language (Simlish), Racter was (supposedly) the manifestation of the idea of language without meaning. With nothing but syntactically correct junk, we are left to make up the meaning based on our own preconceptions. Would different people derive the same meaning from text output from a real version of Racter? What if you translated it?

Language is a strange tool. Many argue that language, if used skillfully, can convey any human concept or emotion. However, I have often wondered whether or not language simply boxes concepts in, limiting them to preconceptions and association, rather than allowing the free transmission of pure thoughts and feelings. Perhaps language is a form of lossy compression of these things, which, no matter how complex and powerful our brains are, can never be fully recreated? Or maybe our current language is simply a flawed way of expressing ourselves and we may someday invent a lossless compression of our ideas.

Greg Poole on March 11, 2007 11:24 AM

Urig points out that Simlish works for western listeners. That's not exactly correct. It works for non-tonal listeners.

For example, Japanese and Korean are non-tonal, so these sounds are somewhat familiar (except for the "R"s and "L"s).

It'd be interesting to create a tonal Simlish language. Maybe hire some Vietnamese and Chinese people to create it.

Even then, those who speak vowel harmonic languages are left out, like Magyar (Hungarian). Can't please everybody. ;)

Haacked on March 11, 2007 1:52 PM

Anyone remember Pingu? All the dialogues were improvised by one person. Admittedly, elements of German, Italian, French and English seep in - but it's another example of a very expressive 'nonsense' language.

benjol on March 12, 2007 4:43 AM

Jeff And yes, the the hamburger sign is probably not a great choice for India.

Obviously, that's a Gardenburger.

This reminds me of the phenomenon of people enjoying opera more when it's sung in a foreign language -- even if not the original one -- than if it's translated to their own. When you can understand what the singers are saying, that interpretation of language kind of preoccupies your brain and prevents you from just hearing the musical emotion.

And this talk of localization vs. translation reminds me of the translation fascination of Douglas R. Hofstadter (of iGoedel, Escher, Bach: An Eternal Golden Braid/i fame). He wrote an entire book about translating one (not-very-long) french poem, and just how much to translate.

Atario on March 12, 2007 5:12 AM

"Many argue that language, if used skillfully, can convey any human concept or emotion. However, I have often wondered whether or not language simply boxes concepts in, limiting them to preconceptions and association, rather than allowing the free transmission of pure thoughts and feelings."

I disagree with Greg. The problem is not the limitations of language but the limitations of vocabulary. Most people just don't have the vocabulary to adequately describe their ideas and emotions. Even if they do their audience might not. So what gets expressed is profanity and slang.

Dave on March 12, 2007 8:34 AM

The first one sound like a woman attempting to keep a baby entertained and the second one sounds like a guy who's trying to romance his dinner.

Robert Claypool on March 12, 2007 9:18 AM

The image for the loituma girl is Orihime from Bleach, by the way,
http://en.wikipedia.org/wiki/Orihime_Inoue
and the song is the Ievan Polka
http://en.wikipedia.org/wiki/Ievan_Polkka

Robert Claypool on March 12, 2007 9:27 AM

Interesting. Nice to know Sims considered Tagalog.

For Simlish to achieve a level of consistency in audio it must follow some substitution algorithm which is definitely reversible - to be at a level where it can be understood and for some idea be extracted.

So, should it reach good adoption level or gets adopted in other areas, what stops those who are not really in the know to create an auto-translator/translation/localization of Simlish to their native language?

I think we'd still be back at square one. i18n is challenging. No matter how technology advances, we humans are what's finite. No one can amass all the knowledge, no one can pay anyone enough to do it, and no one lives long enough to see things through.

I feel the limitation is necessary. An extreme example would be: World domination can be toppled down, we can plan against dictators in our own little known dialects and transmit cryptic messages in local slangs. At a personal level, we can have our own sense of identity no matter how irrelevant that is to the majority of the world's population. And it gives pride and a good living for others.

audienceone on March 12, 2007 10:16 AM

Sorry for OT and rather long.

Dave: "I disagree with Greg. The problem is not the limitations of language but the limitations of vocabulary. Most people just don't have the vocabulary to adequately describe their ideas and emotions. Even if they do their audience might not. So what gets expressed is profanity and slang."

I disagree with you Dave, up to a point, and I agree with Greg, up to a point. I remember reading that indeed language shapes, "boxes" in his words, concepts. That is concepts that one is able to articulate in his mind. Not only I've red about this, it actually makes sense and I notice it myself. I speak/write two foreign languages beside my native. Among the foreign, I'm better with English and I can assure you that there are more than a few situation when I try to communicate in Spanish, the other foreign, but I was thinking in English at the moment, and some constructs and even concepts are simply nonexistent. The same happens when I try to communicate in my native tongue. The funny thing is that knowing all three fairly well allows me to think, at different times, in any of them and hence many times leaves me with difficulties trying to articulate what I think in the language that the conversation was being had. When I can afford to, for example talking to my girlfriend, I sometimes switch language mid-sentence, something that usually annoys her. Bottom line is, knowing all three of them, I often notice parts that are not overlapping. I consider the experience enriching. Please note the very strong resemblance with programing languages and conversions between them versus expressiveness of each and the blunt impossibility to translate certain concepts sometimes. It works exactly the same. The part about seeing the non overlapping parts being enriching also applies.

gd on March 12, 2007 10:18 AM

And now ON topic.
Simlish sounds disturbingly similar to glossolalia (speaking in tongues). 8|

gd on March 12, 2007 10:21 AM

Unfortunately, not all localization issues can be solved by inventing a new set of semiotics. Alas, those of us who have to provide actual information are obliged to, you know, localize. Among other problems, people complain that it's too bland:

http://mikepope.com/blog/AddComment.aspx?blogid=1574

mike on March 12, 2007 11:36 AM

What we need, "the holy grail", is instant automatic (good) natural language translations that is incorporated into webbrowsers.

This is probably still at least ten or maybe even twenty years away since they havent made much progress in the area of artificial intelligence yet and true langauage translation needs that, as proven by babelfish and all the others that do horrible translations.

PL on March 14, 2007 6:27 AM

30 comments and no mention of the Becktionary? For shame.

foobar on March 14, 2007 12:52 PM

As others have noted, achieving "language independence" is an illusion.

The Sims only eliminate some problems with a limited set of languages, and does not really scale. It is a cool idea, saves some money, but it is not a solution:

1. Graphics are culturally-loaded in general
- red octagon with a flat, white hand = open palm is offensive in some cultures
- grocery store depicts a cornucopia = ok in western cultures, familiar with the Greek idea of cornucopia
- shows a hamburger = bad in India

2. Text is left-to-right, books open the same way as the western ones (Arabic/Hebrew are right-to-left, Japanese of Chinese can be vertical, and books start "from the end")

3. Speaking
- "unrecognizable but full of emotion" = expressing emotion is a cultural thing. Think Chinese movies where actors "overdo it" (this is how it seems to a westerner)
- "And yet it's surprisingly easy to figure out what a Sim is talking about" and "The intonation and context of the sounds is enough to extract meaning" FOR A WESTERNER
- some "words" in "Simish" can have obscene meaning in certain languages

Mihai on March 19, 2007 3:40 AM

litreally i have just searched EVERYWHERE on google to find out what the translation is for sim language is into english.
because my best friend has currently said i can look it up.
but everywhere has just said
"sims speak in gibberish language" bla bla...
i want an actuall translation..
so i can print it of what things they say mean
whilst playing my game..and looking at what they are actually saying by having the words they say printed out properly in english
get back to me,

shannen brooks on May 22, 2007 12:49 PM

I second the Sigur Ros recommendation. From Wikipedia: "All of the lyrics on ( ) are sung in Vonlenska, also known as Hopelandic, a constructed language of nonsense syllables which resembles the phonology of the Icelandic language."

Also, the faux-language spoken by Cirque du Soleil (called Cirquish) performers is remarkably similar in tone and mood to Simlish.

So the lesson here is to use as many icons and images in our applications as possible, keeping copy to a minimum?

Zip on December 13, 2007 1:46 AM

I remember something funnier from a game.

If the character is facing one way, he holds his sword in his right hand. If he turns and faces the other way, he holds his sword in his left hand.

I suspect the same thing could be done for books. Given that I didn't notice the first for a year, and books only matter when you turn pages, nobody should notice that one unless they are looking for it.

Joshua on March 14, 2008 2:37 AM

The clangers discovered a universal language years ago :) It was a children's tv show in which the script wasn't read by actors - they played it on whistles. Worried mothers wrote in saying that their kids claimed to understand the words. It stunned everyone when they found that the kids were largely spot on.

(link removed by your spambot eater.. clangers.co.uk will get you there, but not to the article)

Quote:
I did try it once, I took an episode of The Clangers to the 1984 E.B.U. conference in Germany and showed it to the participants without my voice-over. Afterwards I asked them whether they had been able to understand what the Clangers were saying.
"But of course," they replied. "They are speaking perfect German."
"But no." said Gerd, "That is not so. They spoke only Swedish."

James Ingram on April 15, 2008 11:17 AM

The absence of language isn't limiting; it's liberating.

Your life is probably closed in all english universe for all your life. As a memeber of small language group You would have heard songs in some uninteliglible language since early childhood.

Petr on January 20, 2009 3:03 AM

Seems EA has already pulled the Simlish version of Lilly Allen's smile from Youtube

Alistair on March 5, 2009 3:11 AM

I never knew the Sims dev team tried Tagalog (my native tongue) out.

Thanks for that tidbit.

Jon Limjap on February 6, 2010 10:03 PM

frfr istanbul escort Did you embed this video because the bassist kinda sorta looks like you (at least in the screencap)?
fr

bayan on December 16, 2010 10:00 AM

maurers

Servercesur on February 16, 2011 6:23 AM

The comments to this entry are closed.