摘要

In this paper, we propose a new approach based on distribution descriptors for action recognition in depth videos. Our local features are computed from binary patterns which incorporate the shape and motion cues for effective action recognition. Given pixel-level features, our approach estimates video local statistics in a hierarchical manner, where the distribution of pixel-level features and that of frame-level descriptors are modeled using single Gaussians. In this way, our approach constructs video descriptors directly from low-level features without resorting to codebook learning required by Bag-of-features (BoF) based approaches. In order to capture the spatial geometry and temporal order of a video, we use a spatio-temporal pyramid representation for each video. Our approach is validated on six benchmark datasets, i.e. MSRAction3D, MSRGesture3D, DHA, SKIG, UTD-MHAD and CAD-120. The experimental results show that our approach gives good performance on all the datasets. In particular, it achieves state-of-the-art accuracies on DHA, SKIG and UTD-MHAD datasets.

  • 出版日期2018-8