A novel recurrent hybrid network for feature fusion in action recognition

作者:Yu, Sheng; Cheng, Yun; Xie, Li; Luo, Zhiming; Huang, Min; Li, Shaozi*
来源:Journal of Visual Communication and Image Representation, 2017, 49: 192-203.
DOI:10.1016/j.jvcir.2017.09.007

摘要

Action recognition in video is one of the most important and challenging tasks in computer vision. How to efficiently combine the spatial-temporal information to represent video plays a crucial role for action recognition. In this paper, a recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a two-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories (IDT) stream for learning short-term temporal motion feature. In order to mitigate the overfitting issue on small-scale dataset, a video data augmentation method is used to increase the amount of training data, as well as a two-step training strategy is adopted to train our recurrent hybrid network. Experiment results on two challenging datasets UCF-101 and HMDB-51 demonstrate that the proposed method can reach the state-of-the-art performance.