摘要

With the extensive increase of the amount of data, such as text categorization, genomic microarray data, bioinformatics and digital images, there are more and more challenges in feature selection. Recently, feature selection has been widely studied in supervised learning, but there is significantly less work in unsupervised learning because of the absence of class information and explicit search criteria. In this work, we introduce a new measure to assess the importance of features in terms of feature separability. A clustering-based feature selection algorithm is then introduced to conduct the feature selection. The proposed algorithm with nearly linear time complexity selects final feature subset through a ranking procedure based on the separabilities of features and it is applicable to datasets of mixed nature. Experimental results on UCI datasets show that our method, by retaining relevant features, can obtain similar or even better results of classification and clustering for most datasets, and it outperforms other traditional supervised and unsupervised feature selection methods in terms of dimensionality reduction and classification accuracy.