摘要

As a typical model of deep learning, convolutional neural networks (CNN) has a state of art result on the large-scale images classification. However, with the constantly increasing of digit images, there contains more and more redundant, relevant and noisy samples which cause CNN running slowly and its classification accuracy also decreasing at the same time. In this paper, we provide an effective sample selection method for large-scale images based on the improved condensed nearest neighbor rule (called Condensed NN) by the k-means clustering algorithm. Condensed NN can condense a large quantity of original samples, and then the k-means clustering algorithm is used to further optimize and select the high quality samples that will be set as the new original inputs of CNN according to the distribution. Based on the selection of new samples, the training process of CNN can be speeded up dramatically while the classification accuracy is not inferior to the traditional CNN trained by all of samples. Experimental results show that the proposed method can effectively reduce most of useless samples and has a better generalization performance.