摘要

dBlind source separation (BSS) methods are useful tools to recover or enhance individual speech sources from their mixtures in a multi-talker environment. A class of efficient BSS methods are based on the mutual exclusion hypothesis of the source signal Fourier spectra on the time-frequency (TF) domain, and subsequent data clustering and classification. Though such methodology is simple, the discontinuous decisions in the TF domain for classification often cause defects in the recovered signals in the time domain. The defects are perceived as unpleasant ringing sounds, the so called musical noise. Post-processing is desired for further quality enhancement. In this paper, an efficient musical noise reduction method is presented based on a convex model of time-domain sparse filters. The sparse filters are intended to cancel out the interference due to major sparse peaks in the mixing coefficients or physically the early arrival and high energy portion of the room impulse responses. This strategy is efficiently carried out by l(1) regularization and the split Bregman method. Evaluations by both synthetic and room recorded speech and music data show that our method outperforms existing musical noise reduction methods in terms of objective and subjective measures. Our method can be used as a post-processing tool for more general and recent versions of TF domain BSS methods as well.

  • 出版日期2012-3