About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Patrons Developers
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Patrons > LanguageTech > Text analysis tools Welcome Guest!

Need of 'Text analysis' tools for Indic Languages

A Home is built for shelter. But, is shelter the only purpose for building a home? Certainly not throughout one's life. We will look for making our home a luxurious place to live. Like that we are nearing a stage where we had started filling our basic needs in Indic Computing. It's time for luxury. Now we have to start making specialized and industry specific applications in local languages.

In this age of advanced computing, technology can act as a catalyst in many fields including Language research and growth. There are many language related grounds in which technology shall take its position. One of those grounds is "Text Analysis".
What is Text Analysis?
'Text Analysis' can be defined as analyzing the given data in form of text and retrieving the information needed by the system from it. It can be better understood in the context of industries where we use it.

Where 'Text Analysis' is used?
Let us see few areas where text analysis is done.
Lexicography: An easy way to understand text analysis is to look at the tradition of concordance from which it evolved. A concordance is a standard study tool where one can look up a word and find references to all the passages in the target work where that word occurs. They are alphabetically-sorted lists of the vocabulary of a text (its different words or phrases). Occurrences of each word (the keyword) appear under a headword, each one surrounded by enough context to make out the meaning, and each one identified by a citation to the text that gives its location in the original.

Bibliography: A bibliography is an alphabetical list of the sources, we want to acknowledge use when publishing books, magazines, newspapers, CD-ROMs, etc. - that we have used to prepare a piece of work.

Book Index Generator: The index generator takes in a set of text documents and generates, for every word, a sorted list of all the occurrences of that word.
Impact of Unicode in developing Text Analysis tools
Due to increasing Unicode support in the operating systems and programming
Languages, it is easy to implement language specific issues like alphabetic sorting.
After the evolution of Unicode and its support for Indic languages, there is no hurdle in developing tools based on text analysis as already have been done for roman scripts.
Print Print
Broadcast Broadcast
Save this Article Save
E-mail this article link E-Mail
Rate this article
Related Articles
Contribute an article

Also read:

Related articles
Rate this article
1 2 3 4 5 6 7 8 9
Poor Outstanding
Tell us why you rated the content this way. [Optional]
 

Average rating:
8 out of 9
1 2 3 4 5 6 7 8 9
11 people have rated this article
Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.