A generic framework for optimal 2D/3D key-frame extraction driven by aggregated saliency maps

Ferreira Lino<sup>*</sup>; da Silva Cruz Luis A; Assuncao Pedro

doi:10.1016/j.image.2015.09.005

摘要

This paper proposes a generic framework for extraction of key-frames from 2D or 3D video sequences, relying on a new method to compute 3D visual saliency. The framework comprises the following novel aspects that distinguish this work from previous ones: (i) the key-frame selection process is driven by an aggregated saliency map, computed from various feature maps, which in turn correspond to different visual attention models; (ii) a method for computing aggregated saliency maps in 3D video is proposed and validated using fixation density maps, obtained from ground-truth eye-tracking data; (iii) 3D video content is processed within the same framework as 2D video, by including a depth feature map into the aggregated saliency. A dynamic programming optimisation algorithm is used to find the best set of K frames that minimises the dissimilarity error (i.e., maximise similarity) between the original video shots of size N > K and those reconstructed from the key-frames. Using different performance metrics and publicly available data-bases, the simulation results demonstrate that the proposed framework outperforms similar state-of-art methods and achieves comparable performance as other quite different approaches. Overall, the proposed framework is validated for a wide range of visual content and has the advantage of being independent from any specific visual saliency model or similarity metrics.

出版日期2015-11

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2021-04-12 15:52

A generic framework for optimal 2D/3D key-frame extraction driven by aggregated saliency maps

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友