摘要

The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to Hadoop Distributed File System (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability.

  • 出版日期2014-12