Bi-direction hierarchical LSTM with spatial-temporal attention for action recognition

Yang, Haodong; Zhang, Jun<sup>*</sup>; Li, Shuohao; Luo, Tingjin

doi:10.3233/JIFS-18209

摘要

Human action recognition in naturalistic videos is an important task with a broad range of applications. Recently, the encoder-decoder framework based on attention mechanism has been applied to action recognition. Although such conventional methods reach state-of-the-art, they always face a bottleneck of distinguishing similar actions. To solve this problem, we propose a novel recurrent attention convolutional neural network (RACNN), which incorporates convolutional neural networks (CNNs), long short-term memory (LSTM) and attention mechanism. Inspired by the composition of the action, the pre-action and the result of action might be important parts of an action, we introduce bi-direction LSTM with hierarchical structure. Additionally, the separated spatial-temporal attention is employed into our method. Furthermore, we find that incorporating spatio-temporal features extracted from three-dimensional CNNs (3DCNNs) and RGB features can enhance the relationship mined in each frame. Our comprehensive experimental results on two benchmark datasets, i.e., HMDB51 and UCF101, verify the effectiveness of our proposed methods and show that our proposals can significantly outperform the current state-of-the-art methods.

出版日期2019
单位中国人民解放军国防科学技术大学

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2022-11-12 15:33

Bi-direction hierarchical LSTM with spatial-temporal attention for action recognition

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友