摘要

This paper presents a new framework for human action recognition from depth sequences. An effective depth feature representation is developed based on the fusion of 2D and 3D auto-correlation of gradients features. Specifically, depth motion maps (DMMs) are first employed to transform a depth sequence into three images capturing shape and motion cues. A feature extraction method utilizing spatial and orientational auto-correlations of image local gradients is introduced to extract features from DMMs. Space-time auto-correlation of gradients features are also extracted from depth sequences as complementary features to cope with the temporal information loss in the DMMs generation. Each set of features is used as input to two extreme learning machine classifiers to generate probability outputs. A weighted fusion strategy is proposed to assign different weights to the classifier probability outputs associated with different features, thereby providing more flexibility in the final decision making. The proposed method is evaluated on two depth action datasets (MSR Action 3D and MSR Gesture 3D) and obtains the state-of-the-art recognition performance (94.87 % for the MSR Action 3D and 98.50 % for the MSR Gesture 3D).