A Single-Channel Non-Intrusive C50 Estimator Correlated With Speech Recognition Performance

作者:Parada Pablo Peso*; Sharma Dushyant*; Lainez Jose*; Barreda Daniel*; van Waterschoot Toon*; Naylor Patrick A*
来源:IEEE/ACM Transactions on Audio Speech and Language Processing, 2016, 24(4): 719-732.
DOI:10.1109/TASLP.2016.2521486

摘要

Several intrusive measures of reverberation can be computed from measured and simulated room impulse responses, over the full frequency band or for each individual mel-frequency subband. It is initially shown that full-band clarity index C-50 is the most correlated measure on average with reverberant speech recognition performance. This corroborates previous findings but now for the dataset to be used in this study. We extend the previous findings to show that C-50 also exhibits the highest mutual information on average. Motivated by these extended findings, a nonintrusive room acoustic (NIRA) estimation method is proposed to estimate C-50 from only the reverberant speech signal. The NIRA method is a data-driven approach based on computing a number of features from the speech signal and it employs these features to train a model used to perform the estimation. The choice of features and learning techniques are explored in this work using an evaluation set which comprises approximately 100 000 different reverberant signals (around 93 h of speech) including reverberation from measured and simulated room impulse responses. The feature importance of each feature with respect to the estimation of the target C-50 is analysed following two different approaches. In both cases, the newly chosen set of features shows high importance for the target. The best C-50 estimator provides a root-mean-square deviation around 3 dB on average for all reverberant test environments.

  • 出版日期2016-4