Analysis of human scream and its impact on text-independent speaker verification

Hansen John H L<sup>*</sup>; Nandwana Mahesh Kumar; Shokouhi Navid

doi:10.1121/1.4979337

摘要

Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.

出版日期2017-4

全文

访问全文

收藏分享被引(18) 浏览

更新时间：2024-03-30 21:24

Analysis of human scream and its impact on text-independent speaker verification

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友