摘要

A text-independent speaker recognition system using a hybrid Probabilistic Principal Component Analysis (PPCA) and conventional i-vector modeling technique is proposed. In this framework, the total variability space (TVS) is estimated using PPCA while the i-vectors of target speakers and test utterances are extracted using the conventional method. This leads to appreciable decrease in development time, while the time required for training and testing remains unchanged. In this a paper, an algorithmic optimization to the PPCA%26apos;s EM algorithm is developed. This is observed to provide a speed up of 3.7x. To simplify the testing procedure, two different approximation procedures are proposed to be used in this framework. The first approximation assumes a covariance matrix computed based on the PPCA framework. The second approximation proposes an optimization to avoid inverting the precision matrix of the i-vector. The comparison of time taken by these approximations with the baseline i-vector extraction procedure shows speed gains with some deterioration in performance in terms of the Equal Error Rate (EER). Among the proposed techniques, a best case trade-off is obtained with a speed up of 81.2x with deterioration in performance by 0.7% in absolute terms. Speaker recognition performances are studied on the telephone conditions of the benchmark NIST SRE 2010 dataset with systems built on the Mel Frequency Cepstral Co-efficient (MFCC) feature. A trade-off in the performance is observed when the proposed approximations are used. The scalability of these trade-offs is tested on the Mel Filterbank Slope (MFS) feature. The trade-offs observed with the approximations are reduced when the two systems are fused.

  • 出版日期2014-9