Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Ahmadi Sara; Ahadi Seyed Mohammad<sup>*</sup>; Cranen Bert; Boves Lou

doi:10.1186/s13636-014-0036-3

摘要

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.

出版日期2014-10-21

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2021-04-21 22:07

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友