Mining spatiotemporal video patterns towards robust action retrieval

Cao, Liujuan; Ji, Rongrong<sup>*</sup>; Gao, Yue; Liu, Wei; Tian, Qi

doi:10.1016/j.neucom.2012.06.044

摘要

In this paper, we present a spatiotemporal co-location video pattern mining approach with application to robust action retrieval in YouTube videos. First, we introduce an attention shift scheme to detect and partition the focused human actions from YouTube videos, which is based upon the visual saliency [13] modeling together with both the face [35] and body [32] detectors. From the segmented spatiotemporal human action regions, we extract 3D-SIFT [17] detector. Then, we quantize all detected interest points from the reference YouTube videos into a vocabulary, based on which assign each individual interest point with a word identity. An APrior based frequent itemset mining scheme is then deployed over the spatiotemporal co-located words to discover co-location video patterns. Finally, we fuse both visual words and patterns and leverage a boosting based feature selection to output the final action descriptors, which incorporates the ranking distortion of the conjunctive queries into the boosting objective. We carried out quantitative evaluations over both KTH human motion benchmark [26], as well as over 60-hour YouTube videos, with comparisons to the state-of-the-arts.

出版日期2013-4-1
单位哈尔滨工程大学; 清华大学

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2021-07-10 00:50

Mining spatiotemporal video patterns towards robust action retrieval

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友