摘要

Due to the increasing functionality and complexity of Cloud computing systems, the resource failures, including unpredictable crash and performance degradation about resource availability, are inevitable. So a failure-aware resource provisioning algorithm that is capable of offering corresponding strategies immediately after failures happened is paramount. In this paper, an immunological mechanism inspired rescheduling algorithm is proposed for workflow in Cloud systems (IRW). There are four units to imitate the immune system in the IRW algorithm. The surveillance unit monitors possible faults for each Virtual Machine (VM) in resources pool. Once a resource fault is detected, the response unit is triggered to search an appropriate strategy either in the memory unit or in the learning unit for rescheduling the available resources. The available resources are clustered into multiple clusters to narrow the search scope in the learning unit. If none of available VMs can meet the Quality of Services, a new VM is created for the faulty resource. To verify the effectiveness of the proposed IRW, a series of simulation experiments are conducted on both real world workflows with different structures and randomly generated workflows. The results demonstrate that the IRW is able to effectively provide corresponding rescheduling strategies for resource failures and the experiments also highlight the better performance of the proposed approach than that of corresponding algorithms under different situations.