摘要

One successful approach for audio source separation involves applying nonnegative matrix factorization (NMF) to a magnitude spectrogram regarded as a nonnegative matrix. This can be interpreted as approximating the observed spectra at each time frame as the linear sum of the basis spectra scaled by time-varying amplitudes. This paper deals with the problem of the unsupervised instrument-wise source separation of polyphonic signals based on an extension of the NMF approach. We focus on the fact that each piece of music is typically played on a handful of musical instruments, which allows us to assume that the spectra of the underlying audio events in a polyphonic signal can be grouped into a reasonably small number of clusters in the mel-frequency cepstral coefficient (MFCC) domain. Based on this assumption, we propose formulating factorization of amagnitude spectrogram and clustering of the basis spectra in the MFCC domain as a joint optimization problem and derive a novel optimization algorithm based on the majorization-minimization principle. Experimental results revealed that our method was superior to a two-stage algorithm that consists of performing factorization followed by clustering the basis spectra, thus showing the advantage of the joint optimization approach.

  • 出版日期2018-6