摘要

As an important clustering algorithm, graph clustering can be effectively applied to protein interaction networks and microarray data clustering. In this paper, to overcome the shortcomings of the existing graph clustering methods for gene microarray data, a global graph clustering method based on the modularity and the subgraph smoothness is proposed. In this algorithm, subgraph smoothness is introduced to avoid the local optimal solution, subgraphs with low smoothness values in the clustering results are split into singletons, and those newly-generated singletons are used in the next clustering step. After several iterations, the global optimal clustering result can be obtained. The proposed method is then compared with four commonly-used clustering methods (the classic graph clustering, the k-means algorithm, the SOM algorithm, and the Fuzzy algorithm) on a group of genome expression data, and the results show that (1) the proposed method is superior to the other four methods in terms of average non-overlap proportion and FOM'value;(2) when the dataset is divided into 39 clusters, the FOM'value of the proposed method is respectively 28.41%, 19.21%, 9.84% and 24.67% lower than those of the other four methods;and (3) the proposed method is of a classification accuracy, which is higher than that of the Fuzzy algorithm and the SOM algorithm, with an execution efficiency similar to that of the SOM algorithm but 5.94% higher than that of the Fuzzy method.

全文