Automatic extraction of relevant video shots of specific actions exploiting Web data

Do Hang Nga<sup>*</sup>; Yanai Keiji

doi:10.1016/j.cviu.2013.03.009

摘要

Video sharing websites have recently become a tremendous video source, which is easily accessible without any costs. This has encouraged researchers in the action recognition field to construct action database exploiting Web sources. However Web sources are generally too noisy to be used directly as a recognition database. Thus building action database from Web sources has required extensive human efforts on manual selection of video parts related to specified actions. In this paper, we introduce a novel method to automatically extract video shots related to given action keywords from Web videos according to their metadata and visual features. First, we select relevant videos among tagged Web videos based on the relevance between their tags and the given keyword. After segmenting selected videos into shots, we rank these shots exploiting their visual features in order to obtain shots of interest as top ranked shots. Especially, we propose to adopt Web images and human pose matching method in shot ranking step and show that this application helps to boost more relevant shots to the top. This unsupervised method of ours only requires the provision of action keywords such as "surf wave" or "bake bread" at the beginning. We have made large-scale experiments on various kinds of human actions as well as non-human actions and obtained promising results.

出版日期2014-1

全文

访问全文

收藏分享被引(13) 浏览

更新时间：2024-04-15 14:27

Automatic extraction of relevant video shots of specific actions exploiting Web data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友