摘要

In semi-supervised classification, data are partially labeled and the task is to label the remaining data. Compared with unsupervised learning, it is expected that the labeling accuracy would be improved due to the information of the given labels. However, since the class labels are manually assigned by experts and data are sometimes difficult to collect, the assigned labels are noisy. Then, the balance of classes in the labeled data can be different from that in the unlabeled data. In order to solve this problem, a number of practical methods for modifying the class balance, such as instance re-weighting or resampling, have been proposed. Despite the increase in application studies, the effect of the noisy labels on the accuracy has not yet been thoroughly investigated. In the present paper, we theoretically analyze the accuracy of the semi-supervised classification. In comparison with the case of balanced classes, we observe the loss of accuracy caused by label noise.

  • 出版日期2015-7-21