摘要

Compared to corpora-based machine translation methods, rule-based methods have deficiencies, which make them unattractive for the researchers of this field. The first problem is that these methods are language dependent. Rule-based methods require the syntactic information about source and target languages. On the other hand, in many cases, especially for proverbs and specific expressions, syntactic rules are no longer useful. In such cases, the use of example-based approaches is inevitable. In this work, we propose and integrate a set of novel schemes to introduce a new translation system, called BORNA. First a grammar induction method based on the Expectation Maximization (EM) algorithm is proposed. After representing the extracted knowledge in the form of a set of nested finite automata, a recursive model is proposed, which uses a combination of rule and example based techniques. In the translation phase, through a hierarchical chunking process, the input sentence is divided into a set of phrases. Each phrase is searched in the corpus of examples. If the phrase is found, it will not be chunked anymore. Otherwise, the phrase is divided into smaller sub-phrases. The simulation results show that BORNA outperforms its counterparts, significantly. Compared to PARS, Frengly and Google translators, BORNA receives the highest Bleu scores for its translations, while it results in the minimum values for different error measures, including PER, TER and WER.

  • 出版日期2014-6