A hierarchical reliability-driven scheduling algorithm in grid systems

作者:Tang, Xiaoyong; Li, Kenli*; Qiu, Meikang; Sha, Edwin H. -M.
来源:Journal of Parallel and Distributed Computing, 2012, 72(4): 525-535.
DOI:10.1016/j.jpdc.2011.12.004

摘要

In a Grid computing system, many distributed scientific and engineering applications often require multi-institutional collaboration, large-scale resource sharing, wide-area communication, etc. Applications executing in such systems inevitably encounter different types of failures such as hardware failure, program failure, and storage failure. One way of taking failures into account is to employ a reliable scheduling algorithm. However, most existing Grid scheduling algorithms do not adequately consider the reliability requirements of an application. In recognition of this problem, we design a hierarchical reliability-driven scheduling architecture that includes both a local scheduler and a global scheduler. The local scheduler aims to effectively measure task reliability of an application in a Grid virtual node and incorporate the precedence constrained tasks' reliability overhead into a heuristic scheduling algorithm. In the global scheduler, we propose a hierarchical reliability-driven scheduling algorithm based on quantitative evaluation of independent application reliability. Our experiments, based on both randomly generated graphs and the graphs of some real applications, show that our hierarchical scheduling algorithm performs much better than the existing scheduling algorithms in terms of system reliability, schedule length, and speedup.