摘要

This paper considers the underdetermined blind source separation (BSS) of convolutively mixed super-Gaussian signals that include speech, audio, and various other sparse signals. Here, the separation is performed in three steps. In the first and second steps, the mixing matrix and the sources at each time-frequency location are estimated by minimizing the Bayes risk (or the posterior risk) with squared loss. In the final third step, the permutation alignment is conducted by considering the correlation between adjacent spectral bins as in many conventional algorithms. To overcome any computationally intractable integrations involving a complex-valued super-Gaussian source prior, the posterior distribution of the sources is approximated as a mixture of super-Gaussians. The posterior means of the mixing matrix and the sources are obtained with Metropolis-Hastings within Gibbs sampling and the weighted sum of individual super-Gaussians, respectively. Overall, this approximation leads to a separation that is computationally lighter than and as accurate as the algorithm without the approximation. The simulation results of the synthetically generated data in a virtual room with reverberation show that the estimates of the mixing matrix in the first step and the sources in the second step are more accurate than the estimates from the state-of-the-art algorithms in terms of the mixing error ratio (MER) and the signal-to-distortion ratio (SDR). The experiment was also conducted with recorded data in a real room environment using a public benchmark dataset. Results show that the proposed algorithm gives a better performance compared to the state-of-the-art algorithms in terms of the SDR.

  • 出版日期2015-5