A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallel k-Means Algorithm in MapReduce Environment

作者:Cao, Jianfang*; Wang, Min; Shi, Hao; Hu, Guohua; Tian, Yun
来源:Mathematical Problems in Engineering, 2016, 2016: 3593975.
DOI:10.1155/2016/3593975

摘要

The rapid growth of digital images has caused the traditional image retrieval technology to be faced with new challenge. In this paper we introduce a new approach for large-scale scene image retrieval to solve the problems of massive image processing using traditional image retrieval methods. First, we improved traditional k-Means clustering algorithm, which optimized the selection of the initial cluster centers and iteration procedure. Second, we presented a parallel design and realization method for improved k-Means algorithm applied it to feature clustering of scene images. Finally, a storage and retrieval scheme for large-scale scene images was put forward using the large storage capacity and powerful parallel computing ability of the Hadoop distributed platform. The experimental results demonstrated that the proposed method achieved good performance. Compared with the traditional algorithms with single node architecture and parallel k-Means algorithm, the proposed method has obvious advantages for use in large-scale scene image data retrieval in terms of retrieval accuracy, retrieval time overhead, and computational performance (speedup and efficiency, sizeup, and scaleup), which is a significant improvement from applying parallel processing to intelligent algorithms with large-scale datasets.