Automatic speech recognition in cocktail-party situations: A specific training for separated speech

作者:Marti Amparo*; Cobos Maximo; Lopez Jose J
来源:Journal of the Acoustical Society of America, 2012, 131(2): 1529-1535.
DOI:10.1121/1.3675001

摘要

Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of acoustic signal processing, the accuracy of ASR systems is still far behind human performance, especially in adverse acoustic scenarios. In this context, one of the most challenging situations is the one concerning simultaneous speech in cocktail-party environments. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance. In this paper, a specific training to improve the percentage of recognized words in real simultaneous speech cases is proposed. The combination of source separation and this specific training is explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.

  • 出版日期2012-2

全文