A study of large vocabulary speech recognition decoding using finite-state graphs

Ou Zhijian<sup>*</sup>; Xiao Ji

doi:10.1109/ISCSLP.2010.5684837

摘要

The use of weighted finite-state transducers (WFSTs) has become an attractive technique for building large vocabulary continuous speech recognition decoders. Conventionally, the compiled search network is represented as a standard WFST, which is then directly fed into a Viterbi decoder. In this work, we use the standard WFST representations and operations during compiling the search network. The compiled WFST is then equivalently converted to a new graphical representation, which we call finite-state graph (FSG). The resulting FSG is more tailored to Viterbi decoding for speech recognition and more compact in memory. This paper presents our effort to build a state-of-the-art WFST-based speech recognition system, which we call GrpDecoder. Benchmarking of GrpDecoder is carried out separately on two languages - English and Mandarin. The test results show that GrpDecoder which uses the new FSG representation in searching is superior to HTK's HDecode and IDIAP's Juicer for both languages, achieving lower error rates for a given recognition speed.

出版日期2010
单位清华大学

全文

访问全文

收藏分享被引浏览

更新时间：2018-08-03 07:50

A study of large vocabulary speech recognition decoding using finite-state graphs

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友