Saturday, February 25, 2006

I n d i a n L a n g u a g e T r a n s l a t i o n

During my undergraduate days, I have worked in a state-of-the-art research lab on machine translation from English to Indian languages. One of the main motivations for our work was that we would be bringing the WWW to the 90% of the Indian population which is illiterate in English. Owing to this experience, I obtained a fair knowledge of the key problems in this area and the main technical hurdles to be overcome.

But I believe many of the R & D teams that are working on this are missing the holistic view.

Most of the people in India are illiterate, but they can speak native languages such as Telugu and Hindi. Further, many of the people who are literate are not computer literate. And amongst the computer literate people, most of them are already comfortable with English.

Keeping this in view, of what effectiveness would be a new Desktop Manager with arcane Telugu words for terms such as "Start", "My Computer" and "Control Panel" ?

In places such as China and Japan, communication in English is purely impossible. In other places such as France and Germany, there is a strong linguistic pride and awareness. But in India, both of these are absent. Still I believe there is a target audience in India, albeit of a different variety.

In order to clarify this, let me narrate an experience of mine.

Once during my UG days, I had another strange and unique opportunity. My father, a teacher, has invited me to give a few lectures on Modern Physics and Atomic Theory to the outgoing students of the high school. The medium of instruction was Telugu, not English, as had been mine during my school days. I had great fun teaching, and the students liked me very much as an instructor. Also, I had to put some effort in studying the equivalent Telugu words for the scientific terms. But this experience of teaching in the native language had been very benificial for me - in understanding the subject much clearer.

The fact remains that several of the students in Indian high schools are not literate/comfortable in the English language. This is a major hurdle for them to pick up and read the vast majority of the available literature. Even if they are fairly conversant in the rudimentary English language, they are not accustomed to reading large books in English. This section of people includes, surprisingly, both of my own parents - who are much more comfortable in reading books in Telugu.

But ofcourse, the most informative literature is currently available only in English. A sizeable portion of this is present on the web, and the rest is available in print. The key factors are the relevance and the importance of the information that is available.

A good example for such informative texts are the essays from the Edge Foundation.

The state of the art in machine translation research is well behind producing a reasonable output for real texts such as these. But this output can very well be a good starting point for a human translator.

Ideally, such information should be available in differnt formats (audio, images. video, summarized audio, detailed translated text ..) to cater to the needs of the varied audience. The important thing to note is that the needs of the audience are very different.


  • Some people might need to know the latest weather forecasts / stock prices / news headlines in summarized Telugu text.
  • Some people might need to know about the latest agricultural techniques in video format.
  • Some people might need to report/publish websites or blogs using simplistic Telugu interface and video.
  • Students in several schools might need to communicate on their class projects using audio and images.
  • Some students (or elders) might need to read books such as "Selfish Gene" and "The End of Poverty" (two personal favorites of mine) in clear Telugu language. Or they might need video and images in an accompanying website.

In my point of view, these are the issues that are really pertinent for the emancipation of education in the Indian scenario.

Issues such as these should be tackled individually and with special care. There are no magic bullets, such as Indian language GUIs, for solving all these problems in one shot.

These problems will be solved, both by building good computer systems and by massive human effort. These twin efforts should go hand in hand.

Particularly, I want to highlight one issue - that of translation of some informative books (such as The Selfish Gene, The End of Poverty etc) which are written by highly skilled scientists for a general audience. This is the kind of information that needs to be available for high school students. But unfortunately, neither of these books are translated into Telugu, so far, to my knowledge.

This kind of information is currently being published at a rapid pace, both on and off the world wide web. In order to make this available for the general Indian audience, I think it should be encouraged in universities for UG/PG students to translate these literature into regional languages, and course-credit assigned for such efforts. (For example, a group of economics students might translate The End of Poverty)

And by translation, the output need not be a book in Telugu. It can be a website or a series of lectures in images and audio. From my experience (of teaching modern physics in my dad's school), I can say that this will be truly rewarding for the translator too.

Interested people might even maintain weblogs where they post the translated text of specific domains (sometimes translate it as an audio file through speech). Should we call them transblogs ?

These will provide a true breakthrough in bridging the information divide in India.

11 comments:

AK-84 said...

Good point raised.. I really appreciate Sunil Mohan who's been doing something similar for the Prajasakti Telugu daily newspaper.. I believe that you're talking something on the same track..

Kiran said...

In fact, I have been criticizing Sunil Mohan's team :) sorry bunny !!

I think these efforts are useless if they do not get to the holistic point of view.

Mischord said...

Your analysis of the problesm is almost completey right. But u've missed out on seeing the current work to lead to possible solutions.

1. Yeah, most people only "speak" native language. Our solution is to provide a desktop in native lang and make the computer read out the content. This includes menus, documents, web pages. So a person with a little training can practically use the entire desktop. The training is to make him learn to use a computer. Not the lang. And the telugu desktop is ready for this.Check out http://www.swecha.org/wiki/SpeechAssistance

2. I have my reservations on translating all kinds of content into Telugu. For instance, I believe Shelley can be enjoyed only in English. So good if ppl can learn English. But in case of the kind of imp, thought provoking books u r talking about, well they do need to be translated. And to do that, we have to provide ppl with proper/standard tools. And that is exactly what we r doing right now. The content will come up in the later years. And to make it faster, we have to educate ppl to contribute to projects like Wikipedia. Why is it that there is practically no te.wikibooks.org and te.wikinews.org????

My mother is educated enough to translate articles for Wikipedia. So is ur mother. And they r looking for that something that needs their sweat. Think about it.

Kiran said...

Hi Chaitu
You are terribly bogged down by the conventional applications and are refusing to think out of the box. A country like India needs totally different tailor-made applications to suit its population. What works in USA does not work here (read things such as Desktop Manager, Wikipedia etc)

Why are you always playing second fiddle to other developer teams in the US ? Why always localization ? Don't you think you can build something from the scratch ?? There are several things that you could do.

There is a saying that If the only tool you have is a hammer, every problem looks like a nail This is what you are guilty of doing !!

You are not developing software from the software engineering perspective. You have not taken any requirements analysis. You have not done any quality control. You are not doing any tests whether your software would be effective to the consumers.

And when did I ask you to translate Shelley and Keats ?? I asked you to concentrate on the information that matters . There is plenty up there.

What I want you to notice is that Indian languages are more alive in their spoken form and not in their textual form

So you should concentrate on building tools for making audio-blogs and to doing voice-chat.

Please wake up !!

mOby said...

Me big fan.. of whatever u guys are doing.. atleast u guys are doing something..

Autumn Sky said...

Well the issue about introducing computers to the illiterate and semi literate goes way beyond just the language barriers. The problem is equally hard in both the computer science and the user interface domain. An effective user interface can allow even people totally averse to using any technology to come and begin using it. We need good designers to actually sit down with the target audience and build the software. IMHO, a telugu version of the desktop may help an extremely small section ofo the society... those that are educated but not competent in speaking english. I'm not a fan of translation simply bcoz there's no way everything an be translated... imagine if the poor guy just bought a new video card and wishes to install new drivers to play games, he'd have to go to console mode to install drivers which is completely English. Some things such as kernel messages just cant be properly translated which can lead to user frustration. Effectively a guy must know the english commands

A solution to this is to build self contained apps with innovative UI designs. For example, one might have to do a completely textfree UI for empowering the illiterate. This needs more input from a designer than a computer scientist. A colleague of mine built a UI for a kiosk for domestic workers. It was startling to find that even the ones that knew how to read and write in local languages was more comfortable with completely textfree interfaces than those that contained textual descriptions. Visual cues are always very powerful... eg to these domestics, the SBI logo is analogous to a bank. Also ethnic values contribute to how people perceive and understand UIs... eg a certain section of the population reads from right to left and hence look at UIs from left to right as well. It'd be a disaster to just translate/literate the UI from english to a regional language without looking at these issues by using a UI designer and also thru feedback from field tests.

Sunil Mohan said...

And amongst the computer literate people, most of them are already comfortable with English.

Keeping this in view, of what effectiveness would be a new Desktop Manager with arcane Telugu words for terms such as "Start", "My Computer" and "Control Panel" ?


Only English speaking people are computer literate because Indian language computing is non-existant. Indian language computing efforts are set to change exactly that.

In places such as China and Japan, communication in English is purely impossible. In other places such as France and Germany, there is a strong linguistic pride and awareness. But in India, both of these are absent.

According to MIT's figures, 5% of the Indian population can effectively use English for communication. From the census records about 65% of Indian popluation is literate. It therefore follows that Indian people need Indian language computing.

Issues such as these should be tackled individually and with special care. There are no magic bullets, such as Indian language GUIs, for solving all these problems in one shot.

I agree. However, you have misunderstood the aim or our project. We are not trying to build the applications required to solve all computings needs for our rural people. That's a long way off. We don't have much basic infrastructure to do that. Our project tries to lend a hand with this. We are making it possible for other people to be able to build applications for rural needs. The difference is critical. We make it possible for people to display Indian language text on screen, input text, visit web sites etc. Once we have such a framwork, one can think of building web content, build simple useful GUIs etc.

There are most urgent things than having audio blogs and translating "selfish gene" (these are not imagined, these are gathered)
- People need to look at their land records
- Self help groups need to keep their accounts
- People need to get information what crops they should choose to grow.
- People need to know the market prices for their crops

If we have enough bandwidth to do audio/video to rural places, the first thing we should think of is Tele-Medicine.

Website building, audio blogs, translations of "selfish gene", stock prices are phase 10 requirements for Indian language computing. Lets worry about the rest of the 9 phases first. We are doing Phase 0, which is the foundation for rest of the phases.

Bhale Budugu said...

Kiran

I don't know how I landed up here, but FYI - Me and my brother had a similar discussion last month and was surprised to see almost the same content.

May be like minds think alike

Cheers

Kiran said...

@Bhale Budugu :
Thank you :) We are already friends since u visited my blog !

@Bunny (sunil mohan):
first of all, you are one range !
Second of all, you guys are doing awesome work.

:)) thanks for visiting my blog bunny.

According to MIT's figures, 5% of the Indian population can effectively use English for communication. ... It therefore follows that Indian people need Indian language computing.

But how much English do you need to know to use a Desktop Manager ? How many people would like to give Presentations (powerpoint-type) frame word documents in Indian languages in the current scenario ? Handful. Mostly Indian beuraucracy in the government sector These are the issues that you are concentrating upon.

If we have enough bandwidth to do audio/video to rural places, the first thing we should think of is Tele-Medicine.

Website building, audio blogs, translations of "selfish gene", stock prices are phase 10 requirements for Indian language computing.


First, several applications need not have a dedicated internet connection. Providing multimedia-rich educational tools for schools is one such application. This content can be distributed via CDs by a local support group. The same support group extracts these audio+video stuff from the web and writes them down to CDs.

Interested people from the Telugu diaspora from all over the world can contribute to the creation of this content. But they need software to help them in doing this.

Education is the most important issue !! This is a phase-1 issue. Not a phase-10 issue

About the issues that you have mentioned (such as knowing land records, information about what crops to grow) maybe the world wide web is not the right channel for these ! These can be quickly and effectively delivered through other channels - such as call centers which operate on toll-free service.

Telemedicine might be best served through video conference from village dispensaries on a telephone hot-line (instead of the IP network)

This is why I am requesting you to think out of the box. The most serious issue in India right now is providing high-quality internet-age education at high school and undergraduate level. If we don't do this, the next generation of Indians will be severely handicapped in competing with the world.

Sunil Mohan said...

But how much English do you need to know to use a Desktop Manager ?

The gap (no. of people who require Indic computing) is too large.

How many people would like to give Presentations (powerpoint-type) frame word documents in Indian languages in the current scenario ? Handful. Mostly Indian beuraucracy in the government sector These are the issues that you are concentrating upon.

No, these are not the things we are concentrating on. OpenOffice happens to be one of the numerous things we are looking at. Saying that we concentrate on Office is far from truth. As I said, we concentrate on building framework that is required for building all kinds of applications (including the ones you talk about).

Education is the most important issue !! This is a phase-1 issue. Not a phase-10 issue

You have incorrectly concluded from my arguement that education is not important. I am infact planning to contact NGOs working with schools inorder find out their requirements for computing and to see if Free Software can help them in anyway. If you recollect, our last release of Swecha Telugu LiveCD was demonstrated in quite a few schools with exactly education in mind.

About the issues that you have mentioned (such as knowing land records, information about what crops to grow) maybe the world wide web is not the right channel for these ! These can be quickly and effectively delivered through other channels - such as call centers which operate on toll-free service.

This is very limited as the direction of information flow is uni-directional. Even the set of all the applications I talked about can't be done properly through other means. More over, once we have a computer available for a village, quite a few things done ineffectively through other means can be done properly on a computer.

Telemedicine might be best served through video conference from village dispensaries on a telephone hot-line (instead of the IP network)

No, tele-medicine is much more demanding.

thotha said...

hi kiran,
i dont know whether you know this or not.there's a site quillpad.in/telugu inwhich you type in english the script comes in telugu