Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages

Li Xin<sup>*</sup>; Pan Jielin; Zhao Qingwei; Yan Yonghong

doi:10.1587/transinf.E96.D.2478

摘要

Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.

出版日期2013-11
单位中国科学院声学研究所

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2022-01-16 05:09

Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友