Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture

Ahmad Jamil; Sajjad Muhammad; Rho Seungmin; Kwon Soon il; Lee Mi Young; Baik Sung Wook<sup>*</sup>

doi:10.1007/s11042-016-4041-7

摘要

In the millions of emergency reporting calls made each year, about a quarter are non-emergencies. To avoid responding to such situations, forensic examination of the reported situation in the presence of speech as evidence has become an indispensable requirement for emergency response centers. Caller profile information like gender, age, emotional state, transcript, and contextual sounds determined from emergency calls, may be highly beneficial for their sophisticated forensic analysis. However, callers reporting emergency situations often express emotional stress which cause variations in speech production. Furthermore, low voice quality, and background noise make it very difficult to efficiently recognize caller attributes in such unconstrained environments. To overcome limitations of traditional classification systems in such situations, a hybrid two-stage classification scheme is proposed in this paper. Our framework consist of an ensemble of support vector machines (e-SVM) and deep neural networks (DNN) in a cascade. The first stage e-SVM consists of two models discriminatively trained on normal and stressful speech from emergency calls. Deep neural network forming the second stage of classification pipeline, is utilized only in case of ambiguous prediction results from the first stage. The adaptive nature of this two stage classification scheme helps achieve efficiency and high performance. Experiments conducted with a large dataset affirm the suitability of proposed architecture for efficient real-time speaker attribute recognition. The framework is evaluated for gender recognition from emergency calls in the presence of emotions and background noise. The framework yields significant performance improvements in comparison with other similar state-of-the-art gender recognition approaches.

出版日期2018-2

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2021-03-23 05:31

Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友