Internationalization Best Practices
Internationalizing applications to different locales and cultures is essentially
involves some kind of a trade off. To overcome this, products should be
designed specifically for every culture, which is practically impossible due to
many other constraints such as time, money and effort. Although many of the
applications will provide a consistent look and feel and same process to
perform a function irrespective of the locales, applications should be
internationalized in order create code that is maintained easily. This article
discusses some of the key problems that designers and architects of
applications face.
Localizability Issues
Localizability as mentioned earlier is preparing an application or an
application being ready, for localization. Therefore it is an
internationalization issue and not a localization issue. When designing
international applications, architects should be aware of certain issues,
especially what kind of data they will be working with and how they will be
working with it.
Not all things are of same size, shape and color which is particularly true with
data. There are numerous encodings and formats in which data may be available
to an application. It may be a nightmare to recognize all flavors of data but
however one should consider supporting a least common denominator of encodings
and charsets with the application.
Consider the situation in India itself; ISCII is still the most popular encoding
format. Unicode is definitely gaining inroads but it seems it will take time.
The criterion to support ISCII becomes a serious issue when you are developing
applications that will interface with other applications and legacy
applications.
Designers and architects should also be particularly careful when selecting
Commercial Off-the-Shelf (COTS) components. If the component is not
internationalized then it presents a serious problem during the localization
phase of the development process.
Other key areas where application architects should take note of during
internationalization are as follows:
-
Do not hard code strings used in the applications.
-
Do not restrict font sizes.
-
All resources including the ones with which end-users may interact directly,
such as modifying, or creating should be externalized from the application.
-
Data field lengths should always be determined using functions.
-
Fixing string lengths in code should be avoided at all costs.
-
Always use complete sentences and avoid concatenation of phrases at all costs.
-
Create multiple messages based on the situation in which it will be used.
Normally, in English one message may suffice, but during internationalization
it is better to create many messages each for a context. This aids in easy
translations.
-
Dot not use slang or jargon in text messages.
-
All string handling functions used in the application must be charset aware.
-
Always try to use locale aware functions.
-
Measurements used in programs must be considered for internationalization.
-
Paper sizes vary from country to country. This is also a must to be
internationalized.
Any code involving access to data should be considered for internationalization.
Some of the areas where internationalization is required are:
-
Displaying Data.
-
Reading Data.
-
Sorting Data.
-
Searching Data.
-
Parsing Data.
-
Compressing Data.
-
String formatting.
-
Word wrapping.
-
Hyphenation.
-
Numeric formatting.
-
Date formatting.
The rationale behind the restrictions enumerated above is due to the fact that
all these are bound to change with national boundaries and are susceptible to
variations with respect to culture.
Localizing the text strings among Indian languages is quiet easy comparatively.
Translating among Indian languages is comparatively easier because almost all
the Indian languages come from a common ancestor and also there is a common
vocabulary that is available for many languages. But this cannot be taken for
granted. Many of the words in this common vocabulary may be modern to one
language but archaic in others. One example worth citing here is the word name.
The word name is translated as नाम in Hindi. This word is quiet common in many
of the north Indian languages and therefore can be used almost as is.
But the same is not true for the south Indian languages. The word நாமம் in Tamil
also refers to name but is an archaic term which is not in regular use. Instead
the word பெயர் is used in Tamil. Since all the south Indian languages are
derived from Tamil the term పెరు is used in Telugu and പേര് is used in
Malayalam.
While localizing content for Indian languages one should take note of such
variations and differences along with the regional variations that exist among
the various Indian languages.
User Interface Issues Relating to Internationalization
The UI of an application is what an end-user encounters first. Therefore it is
apparent that this is one of the main components under consideration when
internationalizing an application. Most UI elements consist of pictures
embedded in them in the form of icons. Therefore care should be taken while
internationalizing pictures. Pictures are very culture specific, what is
considered a good image or picture is one country or culture may be considered
derogatory in another. Avoid using body parts and human figurines in pictures.
Use the figure of a stick man instead. Avoid using pictures of animals to
denote something else. Avoid direct pictorial representations of English words;
this doesn’t translate well in other languages. Use a different icon for each
context. Do not overuse icons. Do not use text inside images. This presents new
problems of translating both the images and embedded text together.
Be very careful when selecting the colors to be used in an application. Colors
are culture specific. Sounds used in applications are culture specific. Do not
use flags of countries to represent the language used in that country.
Apart from the issues discussed above the major issue is of the layout of
Graphical User Interface (GUI) elements such as Text boxes, Combo boxes etc.
Translation from one language to another language can affect the size of your
application in a variety of ways they are:
-
Localization to most other languages increases the length of text in the
interface.
-
It can affect the layout of controls.
-
It can result in larger file sizes, potentially requiring changes to the layout
of your installation disks and setup software.
The order of GUI elements placed in a screen may change as a result of
localizing the UI. If the elements were placed in a sort order of a particular
language, then the layout will change in another language. This is true only to
a certain extent in the case of Indic languages. The sort order of all
languages that use scripts derived from the ancient Brahmi script the sort
order will likely be the same.
Sentence order will drastically differ between languages; therefore do not place
GUI elements in the sentence order of any particular language. The sentence
order of almost all the Indian languages will vary because all Indian languages
do not use the same grammar. The grammar of the north Indian languages has
certain similarities and so do the south Indian languages. But among them there
are a lot of variations. Care should be taken when using sentence order based
layout of GUI elements as this calls for dynamically changing layouts.
The elements used for obtaining data input should be larger than those used with
English language. Also font sizes are a serious factor both while displaying
and making data inputs. The Arial Unicode MS font renders slightly differently
when compared to fonts that are specific to the languages. Also fonts such as
Vrinda and Kartika require bigger font sizes for legible display.
Care should be taken while laying out components for data input of dates and
numbers. Indic languages use the dd/mm/yy format while in the USA mm/dd/yy
format is popular. Other countries use other formats. Also when displaying time
the term denoting AM and PM varies in all Indian languages.
Always provide sufficient space between text labels and text boxes associated
with them. This is due to the fact that translated text tends to grow or
shrink. Among the Indian languages the growth of text is always nominal but
based on the context in which the translation happens entirely different words
may be used. It is therefore a good practice to anticipate growth even in the
case of Indic languages while performing localization. It is however suggested
that the text labels be placed above an input control instead of adjacent to
it.
Do not use dynamically linked texts from string tables with labels of buttons
and other components as this form is not localization friendly. Rather load the
strings from the string table freshly into memory and set the captions at
runtime. Also avoid placing one control over another.
Always take into account the differences in international names and addresses.
For example, middle names are not used in certain countries. In India people in
the north possess last names or surnames. Whereas in the south some people have
last names and surnames some people do not. In the state of Tamil Nadu most of
the people do not have a last name or surname. It is not a habit in Tamil Nadu.
Checklist of Activities
A checklist of activities is presented in this section which also summarizes
what is said above. The important activities that are involved from an
application architect and developer’s perspective are:
-
Identify culturally dependent data.
-
Externalize translatable text in resource container such as .rc files and XML
files.
-
Do not embed text inside a code segment.
-
Do not embed graphics inside a code segment.
-
Do not hard code the position or size of GUI components.
-
Allow for String Growth.
-
Do not perform String Concatenation dynamically.
-
Format numbers and currencies.
-
Format dates and times.
-
Take care to use appropriate animations and bitmaps.
-
Take care to use appropriate colors and sounds.
-
Use Unicode character properties.
-
Use charset aware functions to compare and sort strings properly.
-
Use locale aware case conversion functions.
-
Use locale aware word and character boundaries functions.
-
Use locale aware Hyphenation and Syllabification functions.
-
Use locale specific measurements.
-
Avoid using culturally and politically sensitive Data.
-
Use proper laying out mechanism for GUI and Printing.