摘要

The aim of clustering ensemble is to combine multiple base partitions into a robust, stable and accurate partition. One of the key problems of clustering ensemble is how to exploit the cluster structure information in each base partition. Evidence accumulation is an effective framework which can convert the base partitions into a co-association matrix. This matrix describes the frequency of a pair of points partitioned into the same cluster, but ignores some hidden information in the base partitions. In this paper, we reveal some of those information by refining the co-association matrix from data point and base cluster level. From the data point level, as pairs of points in the same base cluster may have varied similarities, their contributions to the co-association matrix can be different. From the cluster level, since the base clusters may have diversified qualities, the contribution of a base cluster as a whole can also be different from those of others. After being refined, the co-association matrix is transformed into a pathbased similarity matrix so that more global information of the cluster structure is incorporated into the matrix. Finally, spectral clustering is applied to the matrix to generate the final clustering result. Experimental results on 8 synthetic and 8 real data sets demonstrate that the clustering ensemble based on the refined co-association matrix outperforms some state-of-the-art clustering ensemble schemes.