How Japan helps Google translate India

Source: The Wall Street Journal
Story flagged by: RominaZ

Google Inc. announced earlier this week that it had extended its translation service to five more languages spoken around the subcontinent.

The service, already available for Hindi and Urdu, can now be used on an experimental basis for Bengali, Gujarati, Kannada, Tamil and Telegu. Bengali is spoken in both India and Bangladesh, while Tamil is also spoken in Sri Lanka. But Google Translate had to look to content in Japanese and Russian to translate some of the structural complexity of these languages.

“The real challenge was the relative lack of data,” said Ashish Venugopal, the Google research scientist who oversaw the translation service’s expansion into the five new languages, in a phone interview on Friday. “There’s not a lot of content being produced in these languages on the Web relative to other languages of Asia.”

Google Translate relies on something called “system machine translation.

Google Translate had an easier time with Hindi content in part because the Indian government requires official documents to also appear in Hindi and although the amount of English content that appears on Indian official Web sites is greater than the Hindu content, there are a fair amount of reports and documents that appear in both languages. News organizations can help, too.

For the five new languages, the machines looked outside the subcontinent.

“We really looked at Japanese for the [sentence] order because it follows such a similar subject-object-verb order,” said Mr. Venugopal. “There’s a lot of parallel data from Japanese to English so we have strong models there.”

The main thing the machines needed to learn from Japanese was not to get confused by the different placement of verbs in a sentence so that it would be able to translate a sentence from Tamil as “I kicked the ball” and not “I ball kicked.”

However other languages can only help so much—and their usefulness is mostly directed towards helping Google Translate figure out sentence structure. When it comes to improving its vocabulary, though, machine translation requires what humans require: More reading material. Until more content translated by human users is available on the Web in these languages for the translation system to consume, Google Translate warns that the quality of the service in the five new languages will vary.

See: The Wall Street Journal

Also see: Google Translate announces the release of new alpha languages — the Indic web

Comments about this article



Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search