摘要

The rapid development of the Internet, especially mobile Internet, makes it much easier for people to make social contacts online. Nowadays people tend to spend more and more time on social network service, and produce a lot of image files. This brings a challenge to traditional standalone framework on handing the continued increasing image files. Therefore, it is advisable to find a new way to settle the challenge. Hadoop is a notable, widely-used project for distributed storage and computations with high efficiency, data integrity, reliability and fault tolerance. Hadoop Distributed File System and MapReduce are two primary subprojects respectively for big data storage and computations. However, Hadoop does not provide any interface for image processing. Moreover, both Hadoop Distributed File System and MapReduce have trouble in processing large amount of small files, which result in decreasing efficiency of files access and distributed computations. This prevents us from performing images processing actions on Hadoop. In view of this, this paper proposes a new method to optimize the storage of small image files on Hadoop and self-defines an input/output format to enable Hadoop to process image files.