|
Interview of Dr.Niladri Sekhar Dash |
Published on March 18, 2010

Meet Dr. Niladri Sekhar Dash, currently affiliated with Linguistic Research Unit of Indian Statistical Institute, Kolkata. (Projects) Dr. Niladri Sekhar Dash has done important, commendable work in projects such as Technology development for Indian languages, Indian Language Technology Solutions etc. Currently he is involved in projects such as, Generation of Differentiated Electronic Lexicon for Bangla, A Knowledge-Based Documentation and Archiving of Bangla Electronic Speech and Language Resources, Indian Languages Corpora Initiative (ILCI) and many more. (Expertise) His area of expertise is corpus linguistics, language technology, natural language processing, machine translation, etc. He has carried on remarkable research related activities in the same and general linguistics areas. (Educational) A Ph.D. in Linguistics, from Calcutta University, he has had technical education in Advanced Natural Language Processing, from IIT, Kanpur, and RCC, Jadavpur. (Published) Dr. Dash has published number of books, research papers and book reviews. His Papers are published in numbers of books and journals, have been presented at many conferences, national and international seminars. His books and papers are continuously being recommended by professors to many researchers and postgraduate students of Linguistics. (Dreams) His future plans to work on, are, Generation of Learner Corpora in Indian languages; Development of language and text processing tools; Generation of bilingual lexical database in English and Indian languages; Design Corpus-Based Machine Translation system from English to Bengali; Development of Usage-Based Online Dictionary in Indian languages; Development electronic languages resources in Indian languages; to list a few. |
|
Q. Dr. Niladri Sekhar Dash, how are you fascinated by linguistic features? To tell you honestly I simply live by Linguistics. It is one of those fields which have fascinated me from the very early stage of my adulthood when I was studying at class VIII. I am fortunate that my passion and profession have converged into Linguistics and this has motivated me to commit myself in the service of language and linguistics. For last 18 years, I have been spending half part of my daily life for linguistics, if not more. I consider Linguistics as my Second Mother, and therefore, try to serve her religiously. Linguistics is my passion, profession, and past times. Some of my seniors often say, “If linguistics is abolished from the world, Niladri will die immediately”. |
|
Q. And by reading your profile, one can feel that passion and profession, Dr. Dash! May we ask what does ‘language’ mean to you? For me language is just not a medium of communication and expression. It is much more than this. |
|
Q. Can you please explain this? Yes. I often say “Your language is your identity”. It means, it is through the language that we can ‘know’ a person: we know about the language s/he uses, about the kind of person s/he is, about the society s/he lives in, about the group s/he interacts with, about the frame of mind s/he deals with, about the time and space s/he lives in, about the attitude s/he carries, about the goals s/he wants to accomplish, and about the context where the entire linguistic act takes place. Also the properties and features of a language become fascinating when these are used as sources of history and culture, of past and present, and of growth and civilization. |
|
Q. Like a historian? As a linguist I consider myself an explorer or an excavator, who tries to dig out the wealth of life, living and society engraved in the sands of time. |
|
Q. Sure. And that is a good service to the society. Dr. Dash, we would like to know how did you start working in Indian Statistical Institute? I started my career as a Linguist at Indian Institute of Applied Languages Sciences, Bhubaneswar in 1992. From 1995, I have been working in Indian Statistical Institute, Kolkata, in the areas of corpus linguistics, language technology, Natural Language Processing and Applied Linguistics with a goal for serving my language and my people. |
|
Q. What are the features of Indian Statistical Institute? What are the facilities available? Although the Indian Statistical Institute, Kolkata, in principle, encourages research and development in all fields of human knowledge for the betterment of the nation and its people, in reality, Linguistics is treated here as a step-child who often becomes an easy target of malnourishment and ill-treatment. As a linguist I have received very poor administrative and infrastructural support from the Institute due to apathy on the part of the administration of the institute. |
|
Q. Oh. Maybe it will change. Well. The situation still continues, and I feel, there is no possibility for change of this scenario in near future. |
|
Q. Dr. Dash, we would like to know more on Linguistic. Can we know how mathematics is related to computational Linguistic? Mathematics is an integrated part of computational linguistics. Several research and development activities related to computational linguistics are directly dependent on mathematical models, rules, and calculations. For instance, in the area of digital language corpora collection, corpus processing, authorship attribution, information retrieval, data mining, machine translation, machine learning, parsing, speech synthesis, text-to speech conversion, etc. we need direct application of mathematical models, methods, techniques and interpretations for achieving our goals. |
|
Q. And how science is related to computational Linguistic? Almost all the sub-fields of computational linguistics directly follow the paths of science in the acts of data collection, categorization, analysis, interpretation, inference deduction as well as in application of data and information in development of theories, tools, techniques, and systems. |
|
Q. Does it mean that one should have knowledge of language, math, and science? In the true sense, computation linguistics is a different kind of discipline where both language and science have equal roles to play. Information from the fields of mathematics, statistics, physics, biology, acoustics, psychology, neurology, cognitive science, ethnology, geography, anthropology, sociology, etc. become indispensable at certain stages of developing systems, devices, tools, and techniques for computational linguistics. |
|
Q. Statistics too? How statistics is related to computational Linguistic? Statistics is also directly related to computational linguistics. Particularly in the case of language data collection, corpus compilation, language processing, language data retrieval and analysis, various statistical theories and methods are often used in computational linguistics. |
|
Q. Meaning? For instance, while I was collecting and analyzing text corpora for the Indian languages (particularly Bengali), I had to use several statistical processes, such as Chi-square text, T-text, ANOVA test, Pearson Correlation, Multidimensional Scaling, Factor Analysis, etc. for tier division of characters, identifying spelling errors in words, to find out real word errors in texts, to calculate average length of words, etc. |
|
Q. That is interesting. Dr. Dash, what are linguistic features available in Bengali and/or any other Indian languages, like history, dialects etc?
Any Indian language will have more or less these features. Here is an explanation in reference to Bengali. History: Bengali is one of the most spoken language (ranking 5th or 6th) in the world today. Like other Eastern Indo-Aryan Languages, Bengali arose from the eastern Middle Indic languages of the Indian subcontinent. Historically Bengali carries a strong influence of Sanskrit in its vocabulary and grammar from the Middle Bengali period. In this regard Bengali and Marathi are closely similar as they contain large Sanskrit vocabulary while Hindi and others such as Punjabi, Sindhi and Gujarati are more influenced by Arabic and Persian. Official Status: Although it is used as the official language of the states of West Bengal and Tripura, it is also used as major language in the Indian union territory of Andaman and Nicobar Islands, Jharkhand, Bihar, and other parts of India. Dialects: Regional variation in spoken Bengali constitutes a dialect continuum. During the process of standardization of Bengali in the late 19th and early 20th century, the language variety used in and around Kolkata (the cultural and administrative capital of Bengal) was accepted as the standard variety. Spoken and Literary Varieties: Bengali does not exhibit diglossic situation between its written and spoken forms, although some scholars have wrongly identified it as diglossia. The two styles of language which have emerged involving somewhat different vocabularies and syntax are: Shadhu bhasa and Chalit bhasa. For example, songs such as India’s National Anthem (i.e., janaganamana adhinayaka jaya he) and the national song Bande Mataram are composed in Shadhu bhasa. Writing System: The Bengali writing system is not a purely alphabetic in true sense of the term. The Bengali script, which is a variant of the Eastern Nagari Script, is used throughout eastern India (Assam, West Bengal, Manipur, and the Mithila region of Bihar) and Bangladesh. The uniqueness of this script is noted in its consonants and clusters which carry ‘inherent’ vowel sounds reflected in their free pronunciation. Spelling-Pronunciation Inconsistencies: In spite of some modifications in the 19th century, the Bengali spelling system continues to be based on the one used for Sanskrit, and thus does not take into account some sound mergers that have occurred in the spoken form. Phonology: The phonemic inventory of Bengali consists of 29 consonants and 14 vowels (7 oral + 7 nasal vowels). The language has a wide variety of diphthongs (combination of vowles) occurring within the same syllable. For Bengali words, intonation or pitch of voice has minor significance, apart from a few isolated cases, although intonation plays a significant role in sentence. Vowel length is not contrastive in Bengali. All vowels being equal, there is no meaningful distinction between a ‘short vowel’ and a ‘long vowle’. Morphology and Syntax: Bengali nouns are not assigned gender, which leads to minimal changing of adjectives (inflection). However, nouns and pronouns are highly declined (altered depending on their function in a sentence) into four cases while verbs are heavily conjugated. As a consequence, unlike Hindi, Bengali verbs do not change form depending on the gender of the nouns. Bengali differs from most Indo-Aryan Languages in the zero copula, where the copula or connective be is often missing in the present tense. Thus ‘he is a teacher’ is she shiksakk, (literally ‘he teacher’). In this respect, Bengali is similar to Russian and Hungarian. Vocabulary: The sources of modern Bengali words are tadbhaba, tatsama, deshi (native) and bideshi (foreign). Bengali has as many as 100,000 separate words, of which 50,000 are considered tatsama (direct borrowings from Sanskrit), 21,100 are tadbhaba, and the rest being bideshi (foreign borrowings) and deshi (Austroasiatic borrowings) words. Due to centuries of contact with Europeans, Mughals, Arabs, Turks, Persians, afgans, and East Asians, Bengali has incorporated many words from foreign languages.
Dr. Niladri Sekhar Dash, it was very interesting to talk to you. Our readers will be grateful to you. Thank you! |
|
|