摘要

In many domains, the presence of both positive and negative examples is not satisfied and only one class of examples is available. This special case of binary classification is called as PU (positive and unlabeled) learning in short. At present, many classification algorithms have been introduced for PU learning, such as BLSSVM, BSVM and so on. However, all of these classical approaches were measured by Euclidean distance, which did not take into account the correlative information of each class and the fluctuation of various attributions. In order to reflect this information, we propose a new Mahalanobis distance-based least squares support vector machines (MD-BLSSVM) classifier, in which two Mahalanobis distances are constructed according to the covariance matrices of two class data for optimizing the hyper-planes. Actually, MD-BLSSVM has a special case of BLSSVMs when the covariance matrices are degenerated to the identity matrix. The merits of MD-BLSSVM are (1) Mahalanobis distance of two classes can measure more suitable distance with certain weights on attributions; (2) Excellent kernel technique can be introduced in a reproducing kernel Hilbert space after making certain linear transformation ingeniously; (3) A solution is obtained simply by solving the system of linear equations. In all, MD-BLSSVM is appropriate to many real problems, especially for the case that the distribution and correlation of two classes of data are obviously different. The experimental results on several artificial and benchmark datasets indicate that MD-BLSSVM not only possess faster learning speed, but also obtains better generalization than BLSSVMs and other methods.