摘要

A batch-mode active learning technique taking advantage of the cluster assumption was proposed. It focused on binary classification tasks adopting SVM (support vector machine). In each active learning iteration, unlabeled instances in the SVM margin were first grouped into two clusters. Then from each cluster, points most similar to the other cluster were selected for labeling. Such points lying near the boundary between clusters were expected to become support vectors in the final classification model with high probability. The clustering process was performed in the same kernel space as SVM. With semi-supervised K-medoids, labeled instances were also used to improve the clustering performance. Experiments showed that the proposed method was efficient and robust (to poor initial samples).