Trajectory-Pooled Spatial-Temporal Architecture of Deep Convolutional Neural Networks for Video Event Detection

Li, Yonggang; Ge, Rui; Ji, Yi; Gong, Shengrong<sup>*</sup>; Liu, Chunping<sup>*</sup>

doi:10.1109/TCSVT.2017.2759299

摘要

Nowadays content-based video event detection faces great challenges due to complex scenes and blurred actions in surveillance videos. To alleviate these challenges, we propose a novel spatial-temporal architecture of deep convolutional neural networks for this task. By taking advantage of spatial-temporal information, we fine-tune two-stream networks, and then, fuse spatial and temporal features at convolution layers using a 2D pooling fusion method to enforce the consistence of spatial-temporal information. Based on the two-stream networks and spatial-temporal layer, a triple-channel model is obtained. Furthermore, we implement trajectory-constrained pooling to deep features and hand-crafted features to combine their merits. A fusion method on triple-channel yields the final detection result. The experiments on two benchmark surveillance video data sets including VIRAT 1.0 and VIRAT 2.0, which involve a suit of challenging events, such as person loading an object to a vehicle or person opening a vehicle trunk, manifest that the proposed method can achieve superior performance compared with the state-of-the-art methods on these event benchmarks.

出版日期2019-9
单位常熟理工学院; 嘉兴学院; 苏州大学; 吉林大学

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2023-11-10 04:23

Trajectory-Pooled Spatial-Temporal Architecture of Deep Convolutional Neural Networks for Video Event Detection

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友