摘要

A heterogeneous cloud system, for example, a Hadoop 2.6.0 platform, provides distributed but cohesive services with rich features on large-scale management, reliability, and error tolerance. As big data processing is concerned, newly built cloud clusters meet the challenges of performance optimization focusing on faster task execution and more efficient usage of computing resources. Presently proposed approaches concentrate on temporal improvement, that is, shortening MapReduce time, but seldom focus on storage occupation; however, unbalanced cloud storage strategies could exhaust those nodes with heavy MapReduce cycles and further challenge the security and stability of the entire cluster. In this paper, an adaptive method is presented aiming at spatial-temporal efficiency in a heterogeneous cloud environment. A prediction model based on an optimized Kernel-based Extreme Learning Machine algorithm is proposed for faster forecast of job execution duration and space occupation, which consequently facilitates the process of task scheduling through a multi-objective algorithm called time and space optimized NSGA-II (TS-NSGA-II). Experiment results have shown that compared with the original load-balancing scheme, our approach can save approximate 47-55 s averagely on each task execution. Simultaneously, 1.254 parts per thousand of differences on hard disk occupation were made among all scheduled reducers, which achieves 26.6% improvement over the original scheme.