Internationalization Best Practices

Internationalizing applications to different locales and cultures is essentially involves some kind of a trade off. To overcome this, products should be designed specifically for every culture, which is practically impossible due to many other constraints such as time, money and effort. Although many of the applications will provide a consistent look and feel and same process to perform a function irrespective of the locales, applications should be internationalized in order create code that is maintained easily. This article discusses some of the key problems that designers and architects of applications face.

Localizability Issues

Localizability as mentioned earlier is preparing an application or an application being ready, for localization. Therefore it is an internationalization issue and not a localization issue. When designing international applications, architects should be aware of certain issues, especially what kind of data they will be working with and how they will be working with it.

Not all things are of same size, shape and color which is particularly true with data. There are numerous encodings and formats in which data may be available to an application. It may be a nightmare to recognize all flavors of data but however one should consider supporting a least common denominator of encodings and charsets with the application.

Consider the situation in India itself; ISCII is still the most popular encoding format. Unicode is definitely gaining inroads but it seems it will take time. The criterion to support ISCII becomes a serious issue when you are developing applications that will interface with other applications and legacy applications.

Designers and architects should also be particularly careful when selecting Commercial Off-the-Shelf (COTS) components. If the component is not internationalized then it presents a serious problem during the localization phase of the development process.

Other key areas where application architects should take note of during internationalization are as follows:

  • Do not hard code strings used in the applications.
  • Do not restrict font sizes.
  • All resources including the ones with which end-users may interact directly, such as modifying, or creating should be externalized from the application.
  • Data field lengths should always be determined using functions.
  • Fixing string lengths in code should be avoided at all costs.
  • Always use complete sentences and avoid concatenation of phrases at all costs.
  • Create multiple messages based on the situation in which it will be used. Normally, in English one message may suffice, but during internationalization it is better to create many messages each for a context. This aids in easy translations.
  • Dot not use slang or jargon in text messages.
  • All string handling functions used in the application must be charset aware.
  • Always try to use locale aware functions.
  • Measurements used in programs must be considered for internationalization.
  • Paper sizes vary from country to country. This is also a must to be internationalized.

Any code involving access to data should be considered for internationalization. Some of the areas where internationalization is required are:

  • Displaying Data.
  • Reading Data.
  • Sorting Data.
  • Searching Data.
  • Parsing Data.
  • Compressing Data.
  • String formatting.
  • Word wrapping.
  • Hyphenation.
  • Numeric formatting.
  • Date formatting.

The rationale behind the restrictions enumerated above is due to the fact that all these are bound to change with national boundaries and are susceptible to variations with respect to culture.

Localizing the text strings among Indian languages is quiet easy comparatively. Translating among Indian languages is comparatively easier because almost all the Indian languages come from a common ancestor and also there is a common vocabulary that is available for many languages. But this cannot be taken for granted. Many of the words in this common vocabulary may be modern to one language but archaic in others. One example worth citing here is the word name. The word name is translated as नाम in Hindi. This word is quiet common in many of the north Indian languages and therefore can be used almost as is.

But the same is not true for the south Indian languages. The word நாமம் in Tamil also refers to name but is an archaic term which is not in regular use. Instead the word பெயர் is used in Tamil. Since all the south Indian languages are derived from Tamil the term పెరు is used in Telugu and പേര് is used in Malayalam.

While localizing content for Indian languages one should take note of such variations and differences along with the regional variations that exist among the various Indian languages.

User Interface Issues Relating to Internationalization

The UI of an application is what an end-user encounters first. Therefore it is apparent that this is one of the main components under consideration when internationalizing an application. Most UI elements consist of pictures embedded in them in the form of icons. Therefore care should be taken while internationalizing pictures. Pictures are very culture specific, what is considered a good image or picture is one country or culture may be considered derogatory in another. Avoid using body parts and human figurines in pictures. Use the figure of a stick man instead. Avoid using pictures of animals to denote something else. Avoid direct pictorial representations of English words; this doesn’t translate well in other languages. Use a different icon for each context. Do not overuse icons. Do not use text inside images. This presents new problems of translating both the images and embedded text together.

Be very careful when selecting the colors to be used in an application. Colors are culture specific. Sounds used in applications are culture specific. Do not use flags of countries to represent the language used in that country.

Apart from the issues discussed above the major issue is of the layout of Graphical User Interface (GUI) elements such as Text boxes, Combo boxes etc. Translation from one language to another language can affect the size of your application in a variety of ways they are:

  • Localization to most other languages increases the length of text in the interface.
  • It can affect the layout of controls.
  • It can result in larger file sizes, potentially requiring changes to the layout of your installation disks and setup software.

The order of GUI elements placed in a screen may change as a result of localizing the UI. If the elements were placed in a sort order of a particular language, then the layout will change in another language. This is true only to a certain extent in the case of Indic languages. The sort order of all languages that use scripts derived from the ancient Brahmi script the sort order will likely be the same.

Sentence order will drastically differ between languages; therefore do not place GUI elements in the sentence order of any particular language. The sentence order of almost all the Indian languages will vary because all Indian languages do not use the same grammar. The grammar of the north Indian languages has certain similarities and so do the south Indian languages. But among them there are a lot of variations. Care should be taken when using sentence order based layout of GUI elements as this calls for dynamically changing layouts.

The elements used for obtaining data input should be larger than those used with English language. Also font sizes are a serious factor both while displaying and making data inputs. The Arial Unicode MS font renders slightly differently when compared to fonts that are specific to the languages. Also fonts such as Vrinda and Kartika require bigger font sizes for legible display.

Care should be taken while laying out components for data input of dates and numbers. Indic languages use the dd/mm/yy format while in the USA mm/dd/yy format is popular. Other countries use other formats. Also when displaying time the term denoting AM and PM varies in all Indian languages.

Always provide sufficient space between text labels and text boxes associated with them. This is due to the fact that translated text tends to grow or shrink. Among the Indian languages the growth of text is always nominal but based on the context in which the translation happens entirely different words may be used. It is therefore a good practice to anticipate growth even in the case of Indic languages while performing localization. It is however suggested that the text labels be placed above an input control instead of adjacent to it.

Do not use dynamically linked texts from string tables with labels of buttons and other components as this form is not localization friendly. Rather load the strings from the string table freshly into memory and set the captions at runtime. Also avoid placing one control over another.

Always take into account the differences in international names and addresses. For example, middle names are not used in certain countries. In India people in the north possess last names or surnames. Whereas in the south some people have last names and surnames some people do not. In the state of Tamil Nadu most of the people do not have a last name or surname. It is not a habit in Tamil Nadu.

Checklist of Activities

A checklist of activities is presented in this section which also summarizes what is said above. The important activities that are involved from an application architect and developer’s perspective are:

  • Identify culturally dependent data.
  • Externalize translatable text in resource container such as .rc files and XML files.
  • Do not embed text inside a code segment.
  • Do not embed graphics inside a code segment.
  • Do not hard code the position or size of GUI components.
  • Allow for String Growth.
  • Do not perform String Concatenation dynamically.
  • Format numbers and currencies.
  • Format dates and times.
  • Take care to use appropriate animations and bitmaps.
  • Take care to use appropriate colors and sounds.
  • Use Unicode character properties.
  • Use charset aware functions to compare and sort strings properly.
  • Use locale aware case conversion functions.
  • Use locale aware word and character boundaries functions.
  • Use locale aware Hyphenation and Syllabification functions.
  • Use locale specific measurements.
  • Avoid using culturally and politically sensitive Data.
  • Use proper laying out mechanism for GUI and Printing.