Automatic extraction of bilingual chunk lexicon for spoken language translation

Du Limin; Chen Boxing

摘要

In language communication, an utterance may be segmented as a concatenation of chunks that are reasonable in syntax, meaningful in semantics, and composed of several words. Usually, the order of words within chunks is fixed, and the order of chunks within an utterance is rather flexible. The improvement of spoken language translation could benefit from using bilingual chunks. This paper presents a statistical algorithm to build the bilingual chunk-lexicon automatically from spoken language corpora. Several association measurements are set up as the criteria of the extraction. And local best algorithm, length ratio filtration and stop-word filtration are also incorporated to improve the performance. A bilingual chunk-lexicon was extracted from a corpus with precision of 86.0% and recall of 86.7%. The usability of the chunk-lexicon was then tested with an innovative framework for English-to-Chinese Spoken Language translation, resulted in translation accuracy of 81.83% and 78.69% for training and test sets respectively, measured with Levenshtein distance based similarity score.

出版日期2003

收藏分享被引浏览

更新时间：2017-05-17 18:14

Automatic extraction of bilingual chunk lexicon for spoken language translation

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友