摘要

Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction of arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time-frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared to the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation.