DCAR: A Discriminative and Compact Audio Representation for Audio Processing

Jing, Liping<sup>*</sup>; Liu, Bo; Choi, Jaeyoung; Janin, Adam; Bernd, Julia; Mahoney, Michael W.; Friedland, Gerald

doi:10.1109/TMM.2017.2703939

摘要

This paper presents a novel two-phase method for audio representation, discriminative and compact audio representation (DCAR), and evaluates its performance at detecting events and scenes in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on a Grassmannian manifold. The learned components can effectively represent the structure of audio. Our experiments used the YLI-MED and DCASE Acoustic Scenes datasets. The results show that variants on the proposed DCAR representation consistently outperform four popular audio representations (mv-vector, i-vector, GMM, and HEM-GMM). The advantage is significant for both easier and harder discrimination tasks; we discuss how these performance differences across tasks follow from how each type of model leverages (or does not leverage) the intrinsic structure of the data.

出版日期2017-12
单位河北农业大学; 北京交通大学

全文

访问全文

收藏分享被引(10) 浏览

更新时间：2024-05-14 01:45

DCAR: A Discriminative and Compact Audio Representation for Audio Processing

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友