Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor

Wang, Disong; Zou, Yuexian<sup>*</sup>; Wang, Wenwu

doi:10.1016/j.jfranklin.2017.05.002

摘要

Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction of arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time-frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared to the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation.

出版日期2018-3
单位北京大学

全文

访问全文

收藏分享被引(16) 浏览

更新时间：2024-05-11 09:14

Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友