摘要

Distributed data collection and analysis over networks are ubiquitous, especially over the wireless sensor networks (WSNs). Distributed clustering is one of the most important topics in distributed data analysis. It is desired to explore the hidden structure of the data collected/stored in geographically distributed nodes. In recent years, several distributed data clustering techniques have been developed based on the K-means algorithm or the Gaussian mixture model. In these methods, data structures are captured by measures only based on the first and the second order statistics. When the structure of cluster data is complicated, these statistics are insufficient and may lead to unsatisfactory clustering results. In such a case, using information theoretic measures can achieve better clustering performance since they take the whole distribution of cluster data into account. In this work, we incorporate an information theoretic measure into the cost function of the distributed clustering, to present a linear and a kernel distributed clustering algorithms. In the algorithms, each node solves a local clustering problem through diffusion cooperation with its neighboring nodes. In order to preserve privacy and save communication costs, in the cooperation, nodes merely exchange a few parameters instead of original data with their one-hop neighbors. Simulation results show that the proposed distributed algorithms can achieve almost as good clustering results as the corresponding centralized information theoretic clustering algorithms on both synthetic and real data.