Actor-independent action search using spatiotemporal vocabulary with appearance hashing

Ji, Rongrong; Yao, Hongxun<sup>*</sup>; Sun, Xiaoshuai

doi:10.1016/j.patcog.2010.08.022

摘要

Human actions in movies and sitcoms usually capture semantic cues for story understanding, which offer a novel search pattern beyond the traditional video search scenario. However, there are great challenges to achieve action-level video search, such as global motions, concurrent actions, and actor appearance variances. In this paper, we introduce a generalized action retrieval framework, which achieves fully unsupervised, robust, and actor-independent action search in large-scale database. First, an Attention Shift model is presented to extract human-focused foreground actions from videos containing global motions or concurrent actions. Subsequently, a spatiotemporal vocabulary is built based on 3D-SIFT features extracted from these human-focused action regions. These 3D-SIFT features offer robustness against rotations and viewpoints. And the spatiotemporal vocabulary guarantees our search efficiency, which is achieved by inverted indexing structure with approximate nearest-neighbor search. In the online ranking, we employ dynamic time warping distance to handle the action duration variances, as well as partial action matching. Finally, an appearance hashing strategy is presented to address the performance degeneration caused by divergent actor appearances. For experimental validation, we have deployed actor-independent action retrieval framework in 3-season "Friends" sitcoms (over 30 h). In this database, we have reported the best performance (MAP@1 > 0.53) with comparisons to alternative and state-of-the-art approaches.

出版日期2011-3
单位哈尔滨工业大学

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2021-07-13 18:47

Actor-independent action search using spatiotemporal vocabulary with appearance hashing

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友