摘要

In high-dimensional data clustering practices, the cluster structure is commonly assumed to be confined to a limited number of relevant features, rather than the entire feature set. However, for high-dimensional data, identifying the relevant features and discovering the cluster structure are still challenging problems. To solve these problems, this paper proposes a novel fuzzy c-means (FCM) model with sparse regularization (l(q)(0 < q <= 1)-norm regularization), by reformulating the FCM objective function into the weighted between-cluster sum of square form and imposing the sparse regularization on the weights. An algorithm is also developed to explicitly solve the proposed model. Compared with the existing clustering models, the proposed model can shrink the weights of irrelevant features (noisy features) to exact zero, and also can be efficiently solved in analytic forms when q = 1, 1/2. Experiments on both synthetic and real-world data sets show that the proposed approach outperforms the existing clustering approaches.