An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition

Wang Yow Bang<sup>*</sup>; Li Shang Wen; Lee Lin shan

doi:10.1109/TASL.2013.2263803

摘要

Gabor features have been proposed for extracting spectro-temporal modulation information from speech signals, and have been shown to yield large improvements in recognition accuracy. We use a flexible Tandem system framework that integrates multi-stream information including Gabor, MFCC, and pitch features in various ways, by modeling either or both of the tone and phoneme variations in Mandarin speech recognition. We use either phonemes or tonal phonemes (tonemes) as either the target classes of MLP posterior estimation and/or the acoustic units of HMM recognition. The experiments yield a comprehensive analysis on the contributions to recognition accuracy made by either of the feature sets. We discuss their complementarities in tone, phoneme, and toneme classification. We show that Gabor features are better for recognition of vowels and unvoiced consonants, while MFCCs are better for voiced consonants. Also, Gabor features are capable of capturing changes in signals across time and frequency bands caused by Mandarin tone patterns, while pitch features further offer extra tonal information. This explains why the integration of Gabor, MFCC, and pitch features offers such significant improvements.

出版日期2013-10
单位中国科学院电工研究所

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2024-01-16 22:31

An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友