摘要

A heuristic method for accelerating support vector machine (SVM) training based on a measurement of similarity among samples is presented in this paper. To train SVM, a quadratic function with linear constraints is optimized. The original formulation of the objective function of an SVM is efficient during optimization phase, but the yielded discriminant function often contains redundant terms. The economy of the discriminant function of an SVM is dependent on a sparse subset of the training data, say, selected support vectors by optimization techniques. The motivation for using a sparse controlled version of an SVM is therefore a practical issue since it is the requirement of decreasing computation expense during the SVM testing and enhancing the ability to interpretation of the model. Besides the existing approaches, an intuitive way to achieve this task is to control support vectors sparsely by reducing training data without discounting generalization performance. The most attractive feature of the idea is to make SVM training fast especially for training data of large size because the size of optimization problem can be decreased greatly. In this paper, a heuristic rule is utilized to reduce training data for support vector regression (SVR). At first, all the training data are divided into several groups, and then for each group, some training vectors will be discarded based on the measurement of similarity among samples. The prior reduction process is carried out in the original data space before SVM training, so the extra computation expense may be rarely taken into account. Even considering the preprocessing cost, the total spending time is still less than that for training SVM with the complete training set. As a result, the number of vectors for SVR training becomes small and the training time can be decreased greatly without compromising the generalization capability of SVMs. Simulating results show the effectiveness of the presented method.