An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density

作者:Zhou, Xiangbing*; Gu, Jianggang; Shen, Shaopeng; Ma, Hongjiang; Miao, Fang; Zhang, Hua; Gong, Huaming
来源:ISPRS International Journal of Geo-Information, 2017, 6(12): 392.
DOI:10.3390/ijgi6120392

摘要

Rapidly growing Global Positioning System (GPS) data plays an important role in trajectory and their applications (e.g., GPS-enabled smart devices). In order to employ K-means to mine the better origins and destinations (OD) behind the GPS data and overcome its shortcomings including slowness of convergence, sensitivity to initial seeds selection, and getting stuck in a local optimum, this paper proposes and focuses on a novel niche genetic algorithm (NGA) with density and noise for K-means clustering (NoiseClust). In NoiseClust, an improved noise method and K-means++ are proposed to produce the initial population and capture higher quality seeds that can automatically determine the proper number of clusters, and also handle the different sizes and shapes of genes. A density-based method is presented to divide the number of niches, with its aim to maintain population diversity. Adaptive probabilities of crossover and mutation are also employed to prevent the convergence to a local optimum. Finally, the centers (the best chromosome) are obtained and then fed into the K-means as initial seeds to generate even higher quality clustering results by allowing the initial seeds to readjust as needed. Experimental results based on taxi GPS data sets demonstrate that NoiseClust has high performance and effectiveness, and easily mine the city's situations in four taxi GPS data sets.