Large-Scale Gesture Recognition With a Fusion of RGB-D Data Based on Saliency Theory and C3D Model

作者:Li, Yunan; Miao, Qiguang*; Tian, Kuan; Fan, Yingying; Xu, Xin; Li, Rui; Song, Jianfeng
来源:IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10): 2956-2964.
DOI:10.1109/TCSVT.2017.2749509

摘要

Gesture recognition has raised wide attention in computer vision owing to its many applications. However, the task of video-based large-scale gesture recognition yet faces many challenges, since many gesture-irrelevant factors like the background may disturb the recognition accuracy. To better recognize gestures with large-scale videos, we propose a method based on RGB-D data in this paper, where the "RGB-D" means RGB and depth data captured simultaneously by specific devices like Kinect. To learn gesture details better, we first use an adaptive frame unification strategy to unify the frame number of inputs, and then the RGB and depth data are sent to the C3D model to extract spatiotemporal features, respectively. In order to alleviate the interference of gesture-irrelevant factors, the saliency theory is also employed to generate auxiliary data. Next the features of these data are combined to boost the performance, which can also avoid unreasonable synthetic data, since the dimension of C3D features is uniform. Finally the performances of several classifiers are tested and the best one of SVM classifier is selected to output the ultimate accuracy. Our approach achieves 52.04% and 59.43% accuracy on the validation and testing subset of the Chalearn LAP IsoCD, respectively, both of which outperform our results in the chalearn LAP Large-scale Gesture Recognition Challenge as reported in ICPR 2016.