Introducing Indic Language Standards

There are 18 officially recognized languages in India that include Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Tamil, Telugu and Urdu . Two different scripts are used in India to write in these 18 languages. Urdu, Sindhi and Kashmiri are primarily written using Perso-Arabic scripts and the rest of languages are written using scripts that are derived from ancient Brahmi Script.

In 1983, the department of electronics came out with the Indian Standard Code for Information Interchange (ISCII). The ISCII code represents the all languages derived from Brahmi Script in India. This standard was further upgraded in 1988 to bring ISCII inline with PC-ISCII, an encoding used by IBM PC. The ISCII standard specifies a 7-bit code. The standard allows both English and Indian languages to be used simultaneously. The figure below shows the complete character code for Devanagari, Bengali and Malayalam languages.

The ISCII code has been designed this way, so that it will be easy to transliterate documents written using one script to another script. Just changing the font information will change the script being used in this case. The figures below shows the ISCII code for Devanagari, Bengali and Malayalam Scripts.

ISCII code for Devanagari, Bengali and Malayalam Scripts

ISCII code for Devanagari, Bengali and Malayalam Scripts

ISCII code for Devanagari, Bengali and Malayalam Scripts

ISCII code for Devanagari, Bengali and Malayalam Scripts

ISCII code for Devanagari, Bengali and Malayalam Scripts

In 1991, Unicode brought out specifications for Indic Scripts based on ISCII. Indic Scripts have been supported by Unicode since its very first version. The Table below shows the Unicode ranges, Indic script and Indic languages supported by Unicode.

Script Language Unicode Range
Devanagari Sanskrit, Hindi, Marathi, Konkani and Nepali U+0900 to U+097F
Bengali Bengali, Assamese, Manipuri U+0980 to U+09FF
Gurmukhi Punjabi U+0A00 to U+0A7F
Oriya Oriya U+0B00 to U+0B7F
Tamil Tamil U+0B80 to U+0BFF
Telugu Telugu U+0C00 to U+0C7F
Kannada Kannada U+0C80 to U+0CFF
Malayalam Malayalam U+0D00 to U+0D7F
Gujarati Gujarati U+0A80 to U+0AFF

Support for Various Indic Scripts

Microsoft supports Indic Scripts on the Windows platform by providing support to Unicode, Indic Fonts and Indic language keyboards. Windows XP initially supported Gujarati, Hindi, Kannada, Konkani, Marathi, Punjabi, Sanskrit, Tamil, and Telugu. The launch of Windows XP Service Pack Beta 2 provides support additionally to Bengali and Malayalam Scripts.

Microsoft also supports these scripts by providing Fonts that support these scripts and rendering engine capable of handling layout and display of the complex Indic Script characters. Some of the fonts supported by Mangal for Devanagari, Latha for Tamil, Raavi for Gurmukhi and Devanagari, Shruti for Gujarati and Devanagari, and Tunga for Kannada. Bengali is supported through the Vrinda font and Malayalam is supported through Kartika font.

Support is also provided by the OS for changing the locales. Information such as Calendar, Currency and Date & Time all appear in appropriate local languages. The Windows OS is presently internationalized however not localized for use in Indic languages.

Understanding Indic Keyboard

For data entry in Indic languages the most popular system is by using the Indian Script (INSCRIPT) keyboard. The INSCRIPT keyboard uses the standard 101 keys keyboard for input. The character mapping to the keys is in such a way that syllables for the various languages are in common positions.

The INSCRIPT overlay was standardized by the DOE in 1986. A revision was done in 1988 when a Nukta character was introduced instead of the transform key. Basically the sounds in Indic languages are divided into four groups, Vowels, Consonants, Nasals and Conjuncts. Vowels are pure sounds, Consonants are combination of one sound and a vowel, Nasals are nasal sounds along with vowels and conjuncts are combination sounds of two or more characters. The Script used by Indic languages is a Syllabic alphabet representation. This is the reason why Indian languages are read as they are written. The Syllabic alphabet is basically divided into Swar (Vowel) and Vyanjan (Consonant). Swars can be short Swars or long Swars and Vyanjans are classified into vargs.

The INSCRIPT layout follows a simple arrangement whereby all the Swars are assembled on the left hand side of the keyboard and Vyanjans are arranged on the right hand side of the keyboard. On the left hand side an unshifted key position indicates typing of a matra and shift position indicates a Swar. The left hand side of the home row contains short Swars and the row just above it contains the long Swars. The Halant is provided in an unshifted position. Halant is used to create conjuncts. The home row of right hand side of keyboard consists of the primary characters of 5 vargs. The unaspirated Vyanjans also contain the aspirated Vyanjans in their shift positions, while the other non-nasal Vyanjans of each varg are contained in a pair of vertically adjacent key.

The INSCRIPT layout is a well thought out scheme for typing syllables in high speed. Also a person who has learnt to type in script can type equally fast in another script, since the keys retain their position irrespective of the language.

This figure shows the keyboard overlay for the Bengali keyboard

Bengali Keyboard Overlay

This figure shows the keyboard overlay for the Malayalam keyboard

Malayalam Keyboard Overlay