摘要

Provisioning fault-tolerant scheduling in computational grid is a challenging task. Most of the existing fault tolerant scheduling schemes are either geared toward proactive or reactive. Proactive schemes emphasize on the reasons responsible for generating faults, whereas reactive mechanisms come into effect after failure detection. Unlike most existing mechanisms, we present a novel, dynamic, adaptive, and hybrid fault-tolerant scheduling scheme based on proactive and reactive approaches. In the proactive approach, the resource filtration algorithm picks resources based on resource location, availability, and reliability. Unlike most existing schemes, which rely on remotely connected resources, the proposed algorithm prefers to employ locally available resources as they might have less failure tendency. To cope with the frequent turnover problem, the proposed scheme calculates resource availability time based on various newly identified parameters (e.g., mean time between availability) and picks highly available nodes for task execution. Resource reliability is an indispensable consideration in the proposed scheme and is calculated based on parameters such as jobs success or failure ratio and the types of failures encountered. We employ an optimal resource identification algorithm to determine and select optimal resources for job execution. The performance of the proposed scheme is validated through the GridSim toolkit. Compared with contemporary approaches, experimental results demonstrate the effectiveness and efficiency of the proposed scheme in terms of various performance metrics, such as wall clock time, throughput, waiting and turnaround time, number of checkpoints, and energy consumption.

  • 出版日期2017