Action Parsing-Driven Video Summarization Based on Reinforcement Learning

作者:Lei, Jie; Luan, Qiao; Song, Xinhui; Liu, Xiao; Tao, Dapeng; Song, Mingli*
来源:IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(7): 2126-2137.
DOI:10.1109/TCSVT.2018.2860797

摘要

How to manage, store, and index large numbers of videos is an urgent problem to be solved. Although there are many video summarization models achieving good results, models based on low-level features cannot summarize important semantic information and models based on semantic analysis need related text descriptions that do not exist for most videos. As a consequence, the mining semantic information contained in the video itself is a more feasible way. In this paper, we propose an action parsing-driven video summarization model based on reinforcement learning. The model is mainly divided into two parts, video cut by action parsing and video summarization based on reinforcement learning. In the first part, a sequential multiple instance learning model is trained with weakly annotated data to solve the problem of full annotation's time consuming and weak annotation's ambiguity. In the second part, we design a deep recurrent neural network-based video summarization model that selects the most distinguishable frames comparing with other actions. Meanwhile, the quality of the extracted key frames could be evaluated by the categorization accuracy. Experiments and comparison with state-of-the-art methods demonstrate the advantage of the proposed approach.