Indian Language Computing is ready for the next leap! TNC.Venkata Rangan


Published on March 07, 2010


It’s a moment of pride for Bhashaindia team to publish an interview with TNC Venkata Rangan, a language enthusiast and at the same time Chair Person of the prestigious Tamil Language Computing body, INFITT. He started involving in language related activities when Net forums evolved. Tamil Net was one of such forums. In fact, it was in Tamil Net, the idea of Tamil Language Computing was seeded. Then it sprouted and now, it is a big banyan tree. He has got us to tell something around 25 year history of language computing. Bhashaindia thanks him for giving his wonderful insight about Language Computing and its scopes. TNC Venkata Rangan is known as a synonym for Tamil Language Computing and the most comprehensive dictionary in Tamil, LIFCO is edited and published by him. Below given are the excerpts from the interview.
Ten years before, language sites started blooming. Earlier, they were in myriad 8-bit fonts. Then, Unicode came. Indian language Inputting methods improved. Now, giants like Microsoft are localizing their applications and websites into Indian languages. Can you please tell us a brief history of language computing efforts in India, especially in Tamilnadu?

I would say that Language Computing in India has a history of 25 years. First it was individual applications in MS DOS & UNIX supporting Indian languages with proprietary Software or Hardware extensions, then Windows & Linux came along supporting Indian Languages in operating system level Then Websites started appearing in languages. Now even the basic model mobile phones are supporting Indian languages. So, it is clear that language computing has come up to a level and it is ready for the next big leap.

In India, language Computing can enable and empower language using masses to enjoy the advantages of Information Technology. Right now, majority of the people are using English on the Net. The interesting thing is that only 8 percent of the people in India know English! And almost 90 percent people are depending upon languages for communication.
Right now, Indian masses are forced to use English for communication on the Net. They are not aware of the existence of their languages in computers. The awareness about Language Computing will enable such people to use Information Technology. The next phase of Language Computing is to enable them.
Still, there is a percentage who can not read and write. Still Language Computing can help them to take advantages of Information Technology through text to speech, speech to text applications.
Even though language computing has a history of 25 years, most of the initiatives and efforts were done in the past ten years. Many initiatives like conferences and standardizations were happened in this period. The result is the current scenario.
Tamilnadu has done good initiatives and achieved high targets in regard to Language Computing. How could this happen? How come Tamilians progressed high in Language Computing?
 It is proud to see Tamilnadu making great progress in this area, Tamilnadu Government was one of the first to come out with an IT Policy in India, with a Tamil Internet Conference in 1999 to standardize and many other pioneering moves.  Apart from Tamilnadu, many other states like Gujarat are also making wonderful progress in Language Computing in terms of E-Governance and others.
 Fortunately Tamil has got a good language Diaspora. Tamil population is across the globe and we are very passionate people about our mother tongue. That is the reason behind the head start and this great success. We should thank Srilankan Tamilians & Singapore, Malaysia Tamilians in particular for their great contributions towards Tamil language computing initiatives. Due to the internal issues of Srilanka, many Tamilians came out of Srilanka and spread across the world. As they got opportunities to be familiar with Information technology in their new home along with people from those developed countries, Tamil became one of the languages with which language computing was happening. So when many of the Indians in early 1980s and 90s couldn't afford Personal Computers, these Tamil Diaspora who had access to PCs, started working on getting Tamil into computers.
Can you please explain the environment where in Tamil language computing flourished?
It was through an Internet Mailing List called Tamil.Net, that Tamil IT enthusiasts around the world came together for the first time and Tamil language computing came into focus. There were people like Bala Pillai and Muthu Nedumaran who lead Tamil.Net. In that forum, we all shared our ideas on how to bring Tamil into computers and the roadmap.
 Interestingly, it was in Singapore, first Tamil Internet Conference was held in 1997, thanks to the efforts of Late Dr.Na.Govindaswamy. After two years, Tamilnadu Government conducted a Tamil Internet Conference in 1999 called TamilNet 99. We should thank Professor M. Anandakrishnan who was then the IT advisor to Tamilnadu Chief Minister for his contributions in getting people together for the Tamil IT cause. Immediately after the Conference, an IT Task Force was formed. When the third Tamil Internet conference was held in 2000, INFITT was formed.
INFITT has done many initiatives for Language Computing? Now you have selected as INFITT’s Chair Person. Will you please explain a bit about INFITT and its activities?

INFITT is a global non-profit non-Government body devoted to promoting Tamil computing. It was formed in 1999 and registered in California, USA. Its aim is to help fast development of Tamil Computing & Tamil in Internet. We have conducted good number of Conferences in this regard. Tamil Internet Conferences of INFITT has been held earlier in Chennai (1999, 2003) Singapore (2000, 2004), Kuala Lumpur, Malaysia (2001) and in San Francisco, California (2002), Germany (2009). Now, we are ready for our 9th conference in Coimbatore from June 23 to 27, 2010. It is being organized in close collaboration with the Government of Tamilnadu who will serve as the local host.

Was it individual or voluntary funding kept Language Computing alive or was there any Government helps?

There were funding from a few nations including India. The first Tamil Internet Conference which was held in Singapore was funded by Singapore Government; the second was funded by Tamilnadu Government. Now it is easy to conduct a conference on Language Computing. But, earlier, it was a difficult task. Then, we had immense help from Tamilnadu Government and other nations where in Tamilians lived and that support continues till date.

I have heard that Tamil has got two kinds of keyboards. How did you coordinate this to standardize? Is it good to have a standardized keyboard?
Tamil, we have two popular layouts - TamilNet99 keyboard and Typewriter keyboard. Apart from this we have Romanized keyboard (transliteration) also. I do not find any problem in using several keyboards provided that they produce Unicode characters.
 Romanized keyboard is QWERTY based. Such keyboards will slow down your typing speed. But, DVORAK keywords for English was made to fasten typing activity. TamilNet99 keyboard is such a one designed by Tamilnadu Government. This keyboard was introduced in Tamil conference which was held in 1999. So much research has gone into this keyboard. Still, many Government offices are running the operation on Typewriter keyboard due to their familiarity with that layout.
It was your Grandfather who introduced LIFCO dictionary which is widely used among Tamil Diaspora. What do you think about standardizing glossaries? Is it high time to standardize Technical, Medical or Legal glossaries as language computing is heading towards the next step?
Yes, it was my Grandfather who introduced LIFCO Dictionary about 60 years back. Like you asked, there are efforts from government and non-government bodies to bring dictionaries and standardized Technical glossaries for Tamil. Kanithamizh Samgham, Tamil Virtual University and Universities in Tamilnadu are trying to do it.
I am not for standardizing all the glossaries into one. I believe in creativity. Sometimes, even English is not standardizing terminology. When “Favorites” is used in Internet Explorer, “Bookmarks” has been used in some other browser. However, it is good thing to have many dictionaries and glossaries. As I said, Governmental and non-governmental efforts are working in this direction.
 As an expert in Language Computing domain, can you please tell us about the problems faced by language computing in India?
Converting User Interfaces of application and products into languages are needed for the next growth of Language Computing. But, localization activities are only a piece of language computing. We have many other issues to be solved. First problem is poor computer penetration in India. It is calculated that India has around 4 million PCs for home usage. If economical growth is happening as our Prime Minister projected (over 9% for next 25 years), the scenario will dramatically change.
Another problem which I observed is separate language computing communities. Tamilians are concentrating only in Tamil and Hindi people are interested only in Hindi language computing. A collaborated method is required for Indian languages.
 We should try to understand that a problem solved in a language can solve a problem in other Indian language also and even for other Asian Languages. I agree that there will be differences in grammar rules and other aspects. But still a solved problem will help in giving a direction for unsolved problems in other languages to get resolved.
Another issue is that we do not have original content in the digital form. If you take China and Japan, they have achieved it. They have original content on Net or in digital format in their mother tongue.
So, do you think that the content generated in digital form is not sufficient for Indian languages? Why can not we take the content generated in blogs and websites?
The available content in digital format for any Indian language is not sufficient. We need a huge corpus for research purposes. It should contain all currently in use and now not in use words and usages of the language. It should contain many of the literature works in the language, Technical works in the language. INFITT Past-Chair Dr.Kalyanasundaram's Project Madurai is an ongoing pioneering effort in this regard, recently Tamil Wikipedia is gaining good traction & patronage. Such a complete corpus is very much required to improve applications and products to global standards. Technologies like Optical Character Recognition (OCR) or Natural Language Processing to a good spell checker or security feature require a comprehensive corpus.
 Even though Unicode is used for inputting and viewing data on computers, it seems that publications houses or print media is still sticking with 8-bit fonts. How do you see this situation? Do we have enough Unicode fonts to compete with wide array of 8-bit/ASCII fonts?
Print media or publication houses depend upon few popular publishing software. Many of them have not enabled Indian Language Unicode support on their software. As far as I can see, everything in their core platform are ready. They need to enable, test and start supporting Indian Languages in Unicode. Still, publishing software like Microsoft Publisher has full support our language in Unicode for a long time now. I can say that eventually all publishing software will start supporting it.
Regarding Unicode fonts, many have developed a large number of Unicode fonts for Tamil and other Indian languages. Government agencies like C-DAC & TDIL have developed and are distributing a good number of Unicode fonts, free. There are proprietary fonts too. Over a period of time, there will be more Unicode fonts than ASCII fonts.
What all are the Government initiatives in terms to Tamil language computing? Are you satisfied with Government activities?
Tamilnadu Government is doing many things to improve Tamil Language Computing. Government has held many conferences already; they have Tamil Virtual University for last 10 years. Tamilnadu Government and INFITT are cooperating for a Tamil Internet Conference in 2010. This is happening at Coimbatore. As a part of this conference, Tamil typing has been introduced to school students. I am happy to say that 10000 students took up how to type in Tamil in this competition. The motto was to ‘catch them young’.

As a part of this campaign, we will reach out to college students also. College students will be asked to generate Tamil content on various topics. These articles will be used in free knowledge sharing platforms like Tamil Wikipedia.

I think that Government should set standards and ensure that set standards are used when a task is done. The same way, Governments should propagate the standards and should fund such activities. I am happy to say that Tamilnadu Government is looking into these on an ongoing basis

What will be the next step of Indian Language computing? Will be able to produce or develop indigenous products/applications?
I have seen Chinese people being adamant on "Made in China" applications. I do not see much advantage on this argument. We are using global products in India, similarly our products are used worldwide. What we need from such products/applications is compatibility with Indian languages. If we get it, we can "think locally and act globally".
 Definitely, we need to make indigenous products and applications, but we shouldn't exclude the access to global products. According to latest technology forecasts, India and China will be the lands of information technology innovation & revolution in the next 10 years.  
Earlier, the method was to make a product and application for English users and take it to other language users. As India was not high in computer penetration, products owners never cared for our languages. Now, the tables are changing with our economic growth in last 20 years after liberalization. First thing is that all have understood India’s capability as a thriving market. Second thing is that products will start originating in Asian countries including India and then spread to the world.
Once the products and applications are produced in Asia, they will be “world ready” by default. The difficult part is to make products and applications to compatible with complex Asian languages. Once it is done, taking them to English market, for example, is very much easy.
I personally believe that coming ten years are very crucial for Language Computing. Wonders will happen for sure. By 2020, Indian languages will be equal to English or any other global languages in Information Technology domain.

What do you think about Bhashaindia? How do you rate Microsoft’s Bhashaindia initiative? Are you happy about Bhashaindia’s efforts and initiatives on Indian Language Computing?

It is rare to see such resources like Bhashaindia in India. Government organizations like C-DAC are doing some initiatives like Bhashaindia. Still, Bhashaindia is unique in its nature as it is catering to language enthusiastic people across India across all the major languages. Bhashaindia explains and helps Indic Language Computing. As this is happening in one place, language experts will understand each other’s problems. Bhashaindia is doing a wonderful help to enrich Indian language computing, thank you.