A Spatio-Temporal CRF for Human Interaction Understanding

Wang, Zhenhua; Liu, Sheng; Zhang, Jianhua; Chen, Shengyong<sup>*</sup>; Guan, Qiu

doi:10.1109/TCSVT.2016.2539699

摘要

A better understanding of human interactions in videos can be achieved by simultaneously considering the coarse interactions between people, the action of each individual, and the activity of all people as a whole. We divide the recognition task into two stages. The first stage discriminates interactions and noninteractions, actions and activities based on local image information, while during the second stage, actions and activities are recognized in a global manner based on the local recognition results. A conditional random field (CRF) is designed to model human interactions in the spatio-temporal space. Different from most existing global models which cover either action or activity variables only, our model covers them both by considering the interactions between different types of variables. The graph structure of the CRF is predicted by a model learned from training data, which is different from traditional graph construction methods that typically rely on human heuristics. We learn the parameters of the CRF via structured support vector machine. We propose an efficient inference algorithm to tackle the estimation of labels in long videos containing many people. Our model admits both semantic-level understanding of human interactions in videos and competitive action and activity recognition performance.

出版日期2017-8
单位浙江工业大学; 天津理工大学

全文

访问全文

收藏分享被引(28) 浏览

更新时间：2024-03-13 11:53

A Spatio-Temporal CRF for Human Interaction Understanding

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友