Books Blog: books.elliottback.com


Automated Machine Translation

Posted in Classics, Language by Elliott Back on August 22nd, 2005. [Del.icio.us]

The results of the 2005 NIST evaluation are now online:

www.nist.gov

The winner, hands down, appears to be Google who were able to translate two texts in Arabic and Chinese to English with the best precision on the BLEU-4. If you don’t know what the Bleu metric measures, NIST has an explanation:

Machine translation quality was measured automatically using an N-gram co-occurrence statistic metric developed by IBM and referred to as BLEU. BLEU measures translation accuracy according to the N-grams or sequence of N-words that it shares with one or more high quality reference translations. Thus, the more co-occurrences the better the score. BLEU is an accuracy metric, ranging from “0″ to “1″ with “1″ being the best possible score.

You can also read the original paper from IBM research. The google entry used parallel statistical analysis. You can read about their pride and joy on their own blog.

How does a parallel statistical translation model work? You feed a classifying engine two text streams in the languages of choice, and it will associate words and phrases in one language with words and phrases in the next, for example, feeding it:

Me falta el tiempo.

and the english version

I don’t have time!

The engine may associate el tiempo with time more strongly, since it’s seen that before, but create a special rule for the phrase me falta which in this context means I don’t have and not anything deriving regularly from to lack.

This entry was posted on Monday, August 22nd, 2005 at 4:26 pm and is tagged with , , , , , , , , , , , , , , , , , , , . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

One Response to 'Automated Machine Translation'

  1. Idetrorce said:

    on December 15th, 2007 at 7:45 am

    very interesting, but I don’t agree with you
    Idetrorce

Leave a Reply

Please take time to enjoy the archives: June 2008 (2) May 2008 (1) April 2008 (1) February 2008 (1) January 2008 (3) October 2007 (2) September 2007 (1) August 2007 (3) July 2007 (4) June 2007 (3) May 2007 (2) April 2007 (5) March 2007 (3) February 2007 (3) January 2007 (1) December 2006 (3) November 2006 (4) October 2006 (1) September 2006 (3) August 2006 (2) February 2006 (3) January 2006 (2) December 2005 (3) November 2005 (2) October 2005 (4) September 2005 (1) August 2005 (5) July 2005 (4) June 2005 (1) May 2005 (3) April 2005 (8) March 2005 (8) February 2005 (8) January 2005 (11) December 2004 (6) November 2004 (6)

Fresh, related resources:

Supplied by Google Blog Search
  • MaTra : English to Hindi MT system
    MaTra2 is a Fully-Automatic Indicative English-Hindi Machine Translation System. It translates the text in English into Hindi. Though the system is designed to support any domain, currently it is focusing ?News? and ?Medical? domains ...
  • Babylon Pro 7.0.0.16- K? d?n ??u trong linh v?c t? ?i?n ,d?ch ...
    Despite the fact, that no machine translation is 100% accurate or delivers results equal to human translation, this great new feature, based on the most advanced text translation technology, helps you understand texts in languages you ...
  • July 2008 SPARC Open Access Newsletter
    Some of the answers, as Peter argues, are technological - alerting services, machine translation, automated summarizers for long articles we don't have time to read, text mining and so forth. All really good ideas. ...
  • Data Science Summer Insitute Talk
    But increasingly there are signs that continued quality improvement in language processing applications (including QA, summarization, information extraction, and machine translation) requires deeper and richer representations, ...
  • Portable Babylon Professional v7.0.1.4
    While no machine translation is 100% accurate or equal to human translation, this feature is powered by the most advanced text translation technology and helps you comprehend texts in languages you do not fully understand. ...