摘要

In this paper we propose a novel local feature descriptor based on energy information for human activity recognition. Instead of detecting spatio-temporal interest points, we combine the kinetic energy, gesture potential energy of 3D skeleton joints and others as a feature matrix. The semantic features are obtained by the Bag of Word (BOW) based on k-means clustering. These features conform to not only kinematics and biology of human action, but also the natural visual saliency for action recognition. During the activity recognition, we first present a temporal segmentation method based on kinetic features of human skeleton to cut the long videos into the sub-action segments. Then the sub-action units are iteratively incorporated in the meaningful groups by considering similarity of feature information. Finally, SVM based on kernel function is used to carry out human activity recognition. The experimental results show that our approach outperforms several state-of-the-art algorithms based on our proposed low dimensional features of energy information.