A novel voice activity detection based on phoneme recognition using statistical model

Bao, Xulei<sup>*</sup>; Zhu, Jie

doi:10.1186/1687-4722-2012-1

摘要

In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian Mixture Model based Hidden Markov Model (HMM/GMM) is proposed. Some sophisticated speech features such as high order statistics (HOS), harmonic structure information and Mel-frequency cepstral coefficients (MFCCs) are employed to represent each speech/non-speech segment. The main idea of this new method is regarding the non-speech as a new phoneme corresponding to the conventional phonemes in mandarin, and all of them are then trained under maximum likelihood principle with Baum-Welch algorithm using GMM/HMM model. The Viterbi decoding algorithm is finally used for searching the maximum likelihood of the observed signals. The proposed method shows a higher speech/non-speech detection accuracy over a wide range of SNR regimes compared with some existing VAD methods. We also propose a different method to demonstrate that the conventional speech enhancement method only with accurate VAD is not effective enough for automatic speech recognition (ASR) at low SNR regimes.

出版日期2012
单位上海交通大学

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2019-09-22 14:32

A novel voice activity detection based on phoneme recognition using statistical model

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友