摘要

Object detection is one of the most important topics in computer vision task and has obtained impressive performance thanks to the use of deep convolutional neural network. For object detection, especially in still image, it has achieved excellent performance during past two years, such as the series of R-CNN which plays a vital role in improving performance. However, with the number of surveillance videos increasing, the current methods may not meet the growing demand. In this paper, we propose a new framework named moving-object proposals generation and prediction framework (MPGP) to reduce the searching space and generate some accurate proposals which can reduce computational cost. In addition, we explore the relation of moving regions in feature map of different layers and predict candidates according to the results of previous frames. Last but not least, we utilize spatial-temporal information to strengthen the detection score and further adjust the location of the bounding boxes. Our MPGP framework can be applied to different region-based networks. Experiments on CUHK data set, XJTU data set, and AVSS data set, show that our approach outperforms the state-of-the-art approaches.