Scheduling parallel jobs with tentative runs and consolidation in the cloud

作者:Liu, Xiaocheng*; Zha, Yabing; Yin, Quanjun; Peng, Yong; Qin, Long
来源:Journal of Systems and Software, 2015, 104: 141-151.
DOI:10.1016/j.jss.2015.03.007

摘要

Since the success of cloud computing, more and more high performance computing parallel applications run in the cloud. Carefully scheduling parallel jobs is essential for cloud providers to maintain their quality of service. Existing parallel job scheduling mechanisms do not take the parallel workload consolidation into account to improve the scheduling performance. In this paper, after introducing a prioritized two-tier virtual machines architecture for parallel workload consolidation, we propose a consolidation-based parallel job scheduling algorithm. The algorithm employs tentative run and worldoad consolidation under such a two-tier virtual machines architecture to enhance the popular FCFS algorithm. Extensive experiments on well-known traces show that our algorithm significantly outperforms FCFS, and it can even produce comparable performance to the runtime-estimation-based EASY algorithm, though it does not require users to provide runtime estimation of the job. Moreover, our algorithm allows inaccurate CPU usage estimation and only requires trivial modification on FCFS. It is effective and robust for scheduling parallel workload in the cloud.