摘要

Although there are many excellent clustering algorithms, effective clustering remains very challenging for large datasets that contain many classes. Image clustering presents further problems because automatically computed image distances are often noisy. We address these challenges in two ways. First, we propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters. Then we make use of human input in an active subclustering algorithm to further improve results. We run experiments on a face image dataset and a leaf image dataset and show that our proposed algorithms perform better than baseline methods.

  • 出版日期2014-8

全文