摘要

Robust speaker identification is presented for speech recorded by distant microphones. Three compensation approaches are investigated to improve the robustness of speaker identification in such environments. The first approach applies spectral subtraction before feature extraction to reduce the late-reverberation effect. The second approach makes use of feature warping as feature compensation in distant speaker identification under mismatched training-testing conditions. The third approach employs a novel method of initializing Gaussian mixture model parameters: combined division and k-means clustering. The experiment results show that, relative to the baseline system based on CMN, the channel-average recognition rates for the compensated system were 11.4%, 15.4%, 17.0%, and 17.8% higher for the TIMIT database and 6.8%, 6.4%, 9.3%, and 14.0% higher for the JNAS database for four different environments. In addition, the results show that the combination of the three approaches has better performance than the use of a single compensation method.