Action Recognition Using Multiple Pooling Strategies of CNN Features

Hu, Haifeng<sup>*</sup>; Liao, Zhongke; Xiao, Xiang

doi:10.1007/s11063-018-9932-3

摘要

The deep convolution neural network has shown great potential in the field of human action recognition. For the sake of obtaining compact and discriminative feature representation, this paper proposes multiple pooling strategies using CNN features. We explore three different pooling strategies, which are called space-time feature pooling (STFP), time filter pooling (TFP) and spatio-temporal pyramid pooling (STPP), respectively. STFP shares the advantages of both hand-crafted features and deep ConvNets features. TFP reflects the change of elements on each CNN feature map over time. STPP focuses on the spatial and temporal pyramid structure of the feature maps. We aggregate these pooled features to produce a new discriminative video descriptor. Experimental results show that the three strategies have complementary advantages on the challenging YouTube, UCF50 and UCF101 datasets, and our video representation is comparable to the previous state-of-the-art algorithms.

出版日期2019-8
单位中山大学

全文

访问全文

收藏分享被引(9) 浏览

更新时间：2024-04-13 10:58

Action Recognition Using Multiple Pooling Strategies of CNN Features

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友