摘要
Clustering has been one of the most widely studied topics in data mining and pattern recognition, k-means clustering has been one of the popular, simple and faster clustering algorithms. but the right value of k is unkwown and selecting effectively initial points is also difficult. In view of this, a lot of work has been done on various versions of k-means, which refines initial points and detects the number of clusters. In this paper, we present a new algorithm, called an efficient k-means clustering based on influence factors,which is divided into two stages and can automatically achieve the actual value of k and select the right initial points based on the datasets characters. Propose influence factor to measure similarity of two clusters, using it to determine whether the two clusers should be merged into one. In order to obtain a faster algorithm, a theorem is proposed and proofed, using it to accelerate the algorithm. Experimental results from Gaussian datasets were generated as in Pelleg and Moore (2000)[11] show the algorithm has high quality and obtains a satisfying result.