摘要

This paper addresses the topic of simultaneous speaker localization. The work is related to the generalized cross-correlation (GCC)-based methods for estimating the direction of multiple speakers. Considering the defects of GCC-based direction of arrival (DOA) estimation methods, we have applied several modifications to improve our previous subband processing-based system for the localization of simultaneous speakers. Three modifications have been presented in this paper. In the first step, the DOA estimation method is equipped with a front-end block that determines the number of speakers based on K-means clustering and silhouette criterion. This block provides the true number of speakers for the DOA estimator. Secondly, in order to eliminate the spatial aliasing, we propose a novel nested circular microphone array. In the proposed array design, each microphone pair is only used in appropriate subband according to its inter-microphone distance. In the third step, to overcome the weakness of GCC-phase transform (GCC-PHAT) in noisy and noisy-reverberant conditions, we propose a SNR estimation block. So, we can separate noisy and reverberant conditions and use PHAT filter for reverberant conditions and maximum likelihood filter for noisy situations. The proposed method has been evaluated on both simulated and real multi-speaker speech data in various environmental conditions and different number of speakers. Our evaluations in terms of DOA accuracy demonstrate the superiority of the proposed method compared to the fullband and baseline subband methods.

  • 出版日期2016-1