摘要

There are many factors affecting the variability of an i-vector extracted from a speech segment such as the acoustic content, segment duration, handset type and background noise. The language being spoken is one of the sources of variation which has received limited focus due to the lack of multilingual resources available. Consequently, the discrimination performance is much lower wider multilingual trial condition. Standard session-compensation techniques such as Within-Class Covariance Normalization (WCCN), Linear Discriminant Analysis (LDA) and Probabilistic LDA (PLDA) cannot robustly compensate for language source of variation as the amount of data is limited to represent such variability. Source normalization technique which was developed to compensate for speech-source-variation, offered superior performance in cross-language trials by providing better estimation of within-speaker scatter matrix in WCCN and LDA techniques. However, neither language normalization nor the state-of-the-art PLDA algorithm is capable of modeling language variability on a dataset with insufficient multilingual utterances for each speaker, resulting in a poor performance in cross-language trial condition. This study is an extension to our initial developments of a language-independent PLDA training algorithm which aimed at reducing the effect of language as a source of variability on the performance of speaker recognition. We will provide a thorough analysis of how the proposed approach can utilize multilingual training data from bilingual speakers to robustly compensate for the effect of languages. Evaluated on multilingual trial condition, the proposed solution demonstrated over 10% EER and 13% minimum DCF relative improvement on NIST 2008 speaker recognition evaluation as well as 12.4% EER and 23% minimum DCF on PRISM evaluation set over the baseline system while also providing improvement in other trial conditions.

  • 出版日期2017-9