| Interview of Prof. Shalini Urs on 18th October 2003 |
| Vidyanidhi uses Unicode for their Digital Library of Local Language E-Theses |
A movement is on. A movement to produce, access, archive and procure doctoral theses, to enhance the quality of Indian doctoral theses and to make information accessible to all. It is also a movement to usher in more transparency of research works. Behind this movement is Dr. Shalini R. Urs and Vidyanidhi. Vidyanidhi is a portal and an initiative that will help doctoral research, a pedagogic journey, very exciting and rich with its facilities, resources and tools. Vidyanidhi, as the name itself implies, is a treasure trove of knowledge, helping students and scholastic minds to access and retrieve doctoral theses submitted to various universities in India. This initiative, with its sustained and systematic efforts, collates research theses and dissertations in electronic form, catalogues and archives them for future access, making them lasting forever rather than get frozen in the hard disks of DTP operators across the country.
Dr. Urs has been teaching Library and Information Science at the University of Mysore, since 1976, where she serves as professor. She is interested in Digital Libraries and Electronic Publishing, Electronic Theses, and Dissertations and Metadata. She is also the Director of Vidyanidhi Digital Library and E-scholarship Portal Project ( www.vidyanidhi.org.in), sponsored by the Ford Foundation and the Microsoft Corporation. Through Vidyanidhi, she has set the ball rolling for the ETD movement in India and is spearheading the movement to evolve appropriate policy frameworks - copyright, submission, archiving, for doctoral theses and implementation mechanisms. Vidyanidhi is now holding discussions with various Indian universities to encourage them to make research scholars create their doctoral theses in the electronic form. Dr. Urs was in Bangalore recently and in an exclusive interview, she talked about the problems, challenges and prospects of her mission. Excerpts:
Could you brief us on Vidyanidhi and its activities?
S: Vidyanadhi was started in 2002. It is an initiative to archive doctoral theses in electronic form. Such an archive would improve research capacities of Indian Universities. People have been talking about the digital divide, which is confined to infrastructure – a divide between the haves and have-nots. But our perspective on the divide is totally different. We view it as a divide between those with the necessary IT skills and those without them. Further, electronic theses and dissertations, ETD in short, is a new genre of medium. It is not confined to text. It transcends to other forms also, like dance, music, etc. Dance in fact, is a dissertation. We at Vidyanidhi are looking at author created ETDs. Students should have the necessary skills to create their doctoral theses in an electronic form, which would be archived for better and easier access. Currently, only a few – say four or five – review and refer the doctoral theses. After a period of time, even the research scholar loses interest in his/her work. This is a major problem. The doctoral theses are not easily shared, and it affects the quality of the work. Further, the qualities of present research works are not open. India is spending Rs. 3000 crore annually for research purposes, and the theses works are not easily available for reference by others who might be interested in reviewing, referencing or taking forward the work. This is only a conservative estimate and the real amount spent on doctoral theses may go further up. So, we are focusing on two aspects, one, the creation of electronic copies and two, archiving. Right now, many universities do not have the mechanism to archive the doctoral theses of their students in an electronic library. But it is high time that we have a policy and system to enhance the e-copy. It is not easy. It should be a top-down approach. We should get it started – the process of standardizing doctoral theses - from the University Grants Commission downwards. |
| How did Vidyanidhi overcome the problems with Indian Language fonts? Many students submit their doctoral theses in their local languages? |
| S: When we set Vidyanidhi rolling, we did not realise that about 25% – 30% of the doctoral theses in India are in local languages. We didn't anticipate that we would have to create a multi-lingual database. But when confronted with this problem, we started looking at Indian language fonts. Vidyanidhi envisages two kinds of access to the database. One is the metadata and the other, the full text. Metadata gives information about the full texts available and it grows faster. As of now, we have 50,000 records as metadata. Creating metadata in itself is not a problem. But the task of creating the metadata in a multi-lingual database, was the challenge. |
| So, how did you face the challenge of creating a multilingual database and metadata? |
| S: The first stop was to ask around and do a little bit of research myself on how a multilingual database could be created. The problem was not just of storing the data, it was also one of retrieving. Though the open source codes support multilingual programs, I was unable to find any database systems, which supported local languages. Microsoft had begun introducing Unicode in its tools and operating systems. I was theoretically convinced about the utility of Unicode. When I enquired about its practical viability, most of the responses were discouraging. We believe in getting the global standards to look into issues. So I started checking at what Unicode cannot do and found that most of the responses I got were based on 'myths.' With a library background and archiving point of view, I was convinced that Unicode will be suitable for Vidyanidhi. My focus was on searching and accessing materials, not just the look and feel of the fonts and pages. There is a fair amount of work that is happening in local language font development and it is targetted more on the font styles. However, they do not do a good job of encoding since they are not based on Unicode. We started creating documents using Unicode. We found that Unicode is 95 per cent perfect. I studied the subtle nuances of the language. On a lighter note I can safely say that after studying the collation I know my Kannada better. Indian languages have a pure consonant and a vowel in certain letters. The only drawback in using this system was with the usage of pure consonants, which in any case was not used much in the written language. And we didn't have a problem in choosing the right platform. A majority of the people use MS Office and we knew that it would be easier for us to use the same platform. Also, by the time we had zeroed in on Unicode, Microsoft had come up with Windows XP. An MS Word document could easily be converted into XML (Extensible Markup Language) and it makes archiving easier. Incidentally, people who had earlier discouraged me from using Unicode have now started using it. After all Unicode gets into the bottom of things. |
| Have you tried any other platforms? |
| S: My experience with Linux is not an extensive one. Collation was different, but it was not bad. With Linux, there is a problem while using other application tools. Some other open sources like My SQL is not Unicode friendly. |
| Are you totally satisfied with Unicode? |
| S: Unicode, as I have earlier said is 95 per cent fine. It still has some problems though. But the problems with the fonts can be solved. We won't do things ourselves. People have to come forward to develop them. What we need is a font-based solution. Unicode does not belong to anybody. It is everybody's. We need to join, develop the solutions and direct it. But the problem we have before us is that each one of us is concerned about our own language. We should set this mind-set aside and join hands to find a font-based solution. Indian languages should come together. It also calls for a certain amount of initiative from the government and other institutions. |
| But mobile devices do not support Unicode? |
| S: I am not good at the technical nuances involved here. But I think that in future they will evolve in a manner to support language fonts. I believe it will also be a domain that Microsoft will dominate. |
| Would the English-speaking world be interested in joining the Unicode effort? |
| S: They were not keen on joining Unicode. But post 9/11, they are interested in joining the Unicode efforts. They want to share data. They have recognized the world outside their shores. Well, it could be said as the positive side of 9/11. |
| If Microsoft were to ask you to make a wish list of product upgradations, which could significantly contribute to the creation of an e-library, what will you ask for? |
| S: I would like to request Microsoft to give more support to this initiative. MS Reader (used for e-books) does not support Unicode. It requires to be looked into. A Digital Library Suite would be nice. Currently we are customizing various packages available to suit our needs, but a DL Suite developed specifically for digital libraries would be nice. And, of course it has to be Unicode enabled and should support local languages. Further, we prefer students themselves to create their ETDs. Currently, they get DTP operators to do their work. I think it is part of Indian culture that we expect others to do it for us. Hence, the most of the students are not familiar with the keyboard and of course the basics of computing. Using a local language keyboard is not easy for them. Also, while keying in, switching over from an Indian language to English and vice-versa is also a problem. This needs to be looked into. A more user-friendly language keyboard would be appreciated. The On Screen Keyboards available right now are not very "easy to use" since it does not come with "operating manuals". Doing one's own work by himself or herself has to be encouraged. Today, the students walk into DTP centres to get their work done and submit bound copies of their doctoral theses. Not many bother to even get an electronic copy of their work. Wherever an electronic copy of the local language theses is archived, they scan the copy as images and store it. It was surprising to know that many African languages are OCR compatible (Optical Character Recognition), while not even a single Indian language has OCR support. |
| How can we make more people and universities create ETDs? |
| S: There is a divide between the academia and the industry. It is, of course, an ego problem to a certain extent. They look each other with contempt. But both of these groups should come together for the good of the society. Somehow we have to find means to break the ice. Academia has got good doctoral works while the industry has the technique to use and store them. There should be an initiative to make research scholars to create ETDs and this should be integrated to the exams. In the US all universities, except Massachusetts Institute of Technology, follow this policy for the last 18 years. We should also sensitize people – whether they join or not - about the need and issues involved in the creation of ETDs. |
| Where do you see Vidyanidhi 10 years from now? |
| S: It is not a question of where Vidyanidhi will be after 10 years. But ETD will be there after 10 years, whether in a centralized or scattered manner. I envisage Vidyanidhi as a consortia kind of model. We have discussed the need of ETDs with various universities and are also initiating talks with the IITs. We should understand that many universities are very far behind as far as the required infrastructure is concerned. We expect 30 – 35 per cent of Indian universities to have the basic infrastructure in the next three years. We hope to have 15 – 25 universities with us by 2006 and hopefully by 10 years, we will have 100 universities. |