摘要

Moving object recognition (MOR) is an important but challenging problem in the field of computer vision. The aim of MOR is to recognize moving objects in a given video dataset. Convolutional neural networks (CNNs) have been extensively used for image recognition and video analysis problems. Recently, a 3D-CNN, which contains 3D convolution layers, was proposed to address MOR problems by successfully extracting spatiotemporal features. In this paper, a multi-view (MV) 3D-CNN is proposed for MOR. This model combines 3D-CNNs with a well-known MV learning technique. Because multi-view learning techniques have the ability to obtain more view-related features from videos captured by different cameras, the proposed model can extract more representative features. Moreover, the model contains a special view-pooling layer that can fuse the feature information from previous layers. The proposed MV3D-CNN is applied to both real-world moving vehicle recognition and sign language recognition tasks. The experimental results show that the proposed model possesses good performance.