摘要

The recent development of the i-vector framework for speaker recognition has set a new performance standard in the research field. An i-vector is a compact representation of a speakers utterance extracted from a total variability subspace. Prior to classification using a cosine kernel, i-vectors are projected into an linear discriminant analysis (LDA) space in order to reduce inter-session variability and enhance speaker discrimination. The accurate estimation of this LDA space from a training dataset is crucial to detection performance. A typical training dataset, however, does not consist of utterances acquired through all sources of interest for each speaker. This has the effect of introducing systematic variation related to the speech source in the between-speaker covariance matrix and results in an incomplete representation of the within-speaker scatter matrix used for LDA. The recently proposed source-normalized (SN) LDA algorithm improves the robustness of i-vector-based speaker recognition under both mis-matched evaluation conditions and conditions for which inadequate speech resources are available for suitable system development. When evaluated on the recent NIST 2008 and 2010 Speaker Recognition Evaluations (SRE), SN-LDA demonstrated relative improvements of up to 38% in equal error rate (EER) and 44% in minimum DCF over LDA under mis-matched and sparsely resourced evaluation conditions while also providing improvements in the common telephone-only conditions. Extending on these initial developments, this study provides a thorough analysis of how SN-LDA transforms the i-vector space to reduce source variation and its robustness to varying evaluation and LDA training conditions. The concept of source-normalization is further extended to within-class covariance normalization (WCCN) and data-driven source detection.

  • 出版日期2012-3