摘要

Key frame extraction is an important technique in video summarization, browsing, searching and understanding. In this paper, we propose a novel approach to extract the most attractive key frames by using a saliency-based visual attention model that bridges the gap between semantic interpretation of the video and low-level features. First, dynamic and static conspicuity maps are constructed based on motion, color and texture features. Then, by introducing suppression factor and motion priority schemes, the conspicuity maps are fused into a saliency map that includes only true attention regions to produce attention curve. Finally, after time-constraint cluster algorithm grouping frames with similar content, the frames with maximum saliency value are selected as key-frames. Experimental results demonstrate the effectiveness of our approach for video summarization by retrieving the meaningful key frames.