摘要

Action recognition is an important and popular area in computer vision. Because of the helpfulness of action recognition of the skeleton and the development of related pose estimation techniques, action recognition based on skeleton data has drawn considerable attention and has been widely studied in recent years. In this paper, we propose an attention-based multiview re-observation fusion model for skeletal action recognition. The proposed model focuses on the factor of observation view of actions, which greatly influences action recognition. The model utilizes action information from multiple observation views to improve the recognition performance. In this method, we reobserve input skeleton data from several possible viewpoints, process these augmented observation data with a long short-term memory (LSTM) network separately, and, finally, fuse the outputs to generate the final recognition result. In the multiview fusion process, an attention mechanism is applied to regulate the fusion operation according to the helpfulness for the recognition of all views. In this way, the model can fuse information from multiple viewpoints to recognize actions and can learn to evaluate observation views to improve fusion performance. We also propose a multilayer feature attention method to improve the performance of the LSTM in our model. We utilize an attention mechanism to enhance the feature expression by finding and focusing on informative feature dimensions according to contextual action information. Moreover, we propose stacking multiple layers of attention operation in a multilayer LSTM network to further improve network performance. The final model is integrated into an end-to-end trainable network. Experiments conducted on two popular datasets, NTU RGB+D and SBU Kinect interaction, show that our model achieves state-of-the-art performance.