摘要

In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian Mixture Model based Hidden Markov Model (HMM/GMM) is proposed. Some sophisticated speech features such as high order statistics (HOS), harmonic structure information and Mel-frequency cepstral coefficients (MFCCs) are employed to represent each speech/non-speech segment. The main idea of this new method is regarding the non-speech as a new phoneme corresponding to the conventional phonemes in mandarin, and all of them are then trained under maximum likelihood principle with Baum-Welch algorithm using GMM/HMM model. The Viterbi decoding algorithm is finally used for searching the maximum likelihood of the observed signals. The proposed method shows a higher speech/non-speech detection accuracy over a wide range of SNR regimes compared with some existing VAD methods. We also propose a different method to demonstrate that the conventional speech enhancement method only with accurate VAD is not effective enough for automatic speech recognition (ASR) at low SNR regimes.