摘要

In video recommendation systems, emotions are used along with several other proposed content-based video features. However, such features are independently based on visual or audio signals and the relationship representing the dependencies between the visual and the audio signals is still unexplored. In order to solve this problem, a novel feature set called HHTC features based on the combination of Hilbert-Huang Transform (HHT) based visual features, HHT-based audio features, and cross-correlation features is proposed in this paper. In addition to the dependencies between the visual and the audio signals, the proposed HHTC features have the ability to indicate the time-varying characteristics of these signals. The proposed features are applied to video emotion recognition with the Support Vector Regression (SVR) with potential use in video affective recommendation systems. Experimental results demonstrate that the proposed approach can achieve an improved performance of video affective recognition.