Multilayer structure based lexicon optimization for language modeling

作者:Ablimit Mijit; Pattar Akbar; Hamdulla Askar
来源:Journal of Tsinghua University(Science and Technology), 2017, 57(3): 257-263.
DOI:10.16511/j.cnki.qhdxxb.2017.26.006

摘要

An appropriate lexicon set must be selected as an important first step in developing large vocabulary continuous speech recognition (LVCSR) systems. The word unit is chosen as the lexicon basis to avoid word boundary detection problems. However, the lexicon basis selection is not as simple for the derivative morphological structure (e.g., agglutinative languages). Furthermore, there are no word boundaries in many languages such as Chinese and Japanese. This paper uses the Uyghur LVCSR system to analyze various particle based automatic speech recognition (ASR) systems with comparisons of the ASR results for various linguistic layers to develop a method to balance the advantages of two layer lexicons. The ASR results for the two layers are aligned and compared to analyze error patterns and extract samples as training data for the alternative selection method. Tests show that this method effectively improves the ASR accuracy with a small lexicon size.

  • 出版日期2017

全文