An integrated K-means - Laplacian cluster ensemble approach for document datasets

作者:Xu, Sen*; Chan, Kung-Sik; Gao, Jun; Xu, Xiufang; Li, Xianfeng; Hua, Xiaopeng; An, Jing
来源:Neurocomputing, 2016, 214: 495-507.
DOI:10.1016/j.neucom.2016.06.034

摘要

Cluster ensemble has become an important extension to traditional clustering algorithms, yet the cluster ensemble problem is very challenging due to the inherent difficulty in resolving the label correspondence problem. We adapted the integrated K-means - Laplacian clustering approach to solve the cluster ensemble problem by exploiting both the attribute information embedded in the cluster labels and the pairwise relations among the objects. The optimal solution of the proposed approach requires computing the pseudo inverse of the normalized Laplacian matrix and the eigenvalue decomposition of a large matrix, which can be computationally burdensome for large scale document datasets. We devised an effective algebraic transformation method for efficiently carrying out the aforementioned computations and proposed an integrated K-means - Laplacian cluster ensemble approach (IKLCEA). Experimental results with benchmark document datasets demonstrate that IKLCEA outperforms other cluster ensemble techniques on most cases. In addition, IKLCEA is computationally efficient and can be readily employed in large scale document applications.