摘要

In supervised classification, we learn from a training set of labeled observations to form a decision rule for classifying all unlabeled test cases. But if the training sample is small, one may fail to extract sufficient information from that sample to develop a good classifier. Because of the statistical instability of nonparametric methods, this problem becomes more evident in the case of nonparametric classification. In such cases, if one can extract useful information also from unlabeled test cases and use that to modify the classification rule, the performance of the resulting classifier can be improved substantially. In this article, we use a probabilistic framework to develop such methods for nearest neighbor classification. The resulting classifiers, called semi-supervised or transductive classifiers, usually perform better than supervised methods, especially when the training sample is small. Some benchmark data sets are analyzed to show the utility of these proposed methods.

  • 出版日期2012-7-1