摘要

As an essentially multi-label classification problem, audio concept detection is normally solved by treating concepts independently. Since in this process the original useful concept correlation information is missing, this paper proposes a new model named Correlated-Aspect Gaussian Mixture Model (C-AGMM) to take advantage of such a clue for enhancing multi-label audio concept detection. Originating from Aspect Gaussian Mixture Model (AGMM) which improves GMM by incorporating it into probabilistic Latent Semantic Analysis (pLSA), C-AGMM still learns a probabilistic model of the whole audio clip by regarding concepts as its component elements. However, different from AGMM that assumes concepts independent with each other, C-AGMM considers their distribution on a sub-manifold embedded in the ambient space. With an assumption that if two concepts are close in the intrinsic geometry of this distribution then their conditional probability distributions are likely to show similarity, a graph regularizer is exploited to model the correlation between these concepts. Following the Maximum Likelihood Estimate principle, model parameters of C-AGMM encoding the concept correlation clue are derived and used directly as the detection criterion. Experiments on two datasets show the effectiveness of our proposed model.

全文