摘要

Although traditional bag-of-words model, together with local spatiotemporal features, has shown promising results for human action recognition, it ignores all structural information of features, which carries important information of motion structures in videos. Recent methods usually characterize the relationship of quantized spatiotemporal features to overcome this drawback. However, the propagation of quantization error leads to an unreliable representation. To alleviate the propagation of quantization error, we present a coding method, which considers not only the spatial similarity but also the reconstruction ability of visual words after giving a probabilistic interpretation of coding coefficients. Based on our coding method, a new type of feature called cumulative probability histogram is proposed to robustly characterize contextual structural information around interest points, which are extracted from multi-layered contexts and assumed to be complementary to local spatiotemporal features. The proposed method is verified on four benchmark datasets. Experiment results show that our method can achieve better performance than previous methods in action recognition.