摘要

This paper presents a new learning algorithm for audiovisual fusion and demonstrates its application to video classification for film database. The proposed system utilized perceptual features for content characterization of movie clips. These features are extracted from different modalities and fused through a machine learning process. More specifically, in order to capture the spatio-temporal information, an adaptive video indexing is adopted to extract visual feature, and the statistical model based on Laplacian mixture are utilized to extract audio feature. These features are fused at the late fusion stage and input to a support vector machine (SVM) to learn semantic concepts from a given video database. Based on our experimental results, the proposed system implementing the SVM-based fusion technique achieves high classification accuracy when applied to a large volume database containing Hollywood movies.

  • 出版日期2010-5