摘要

Reliability is widely identified as an increasingly relevant issue in heterogeneous service-oriented systems because processor failure affects the quality of service to users. Replication-based fault-tolerance is a common approach to satisfy application's reliability requirement. This study solves the problem of minimizing redundancy to satisfy reliability requirement for a directed acyclic graph (DAG)-based parallel application on heterogeneous service-oriented systems. We first propose the enough replication for redundancy minimization (ERRM) algorithm to satisfy application's reliability requirement, and then propose heuristic replication for redundancy minimization (HRRM) to satisfy application's reliability requirement with low time complexity. Experimental results on real and randomly generated parallel applications at different scales, parallelism, and heterogeneity verify that ERRM can generate least redundancy followed by HRRM, and the state-of-the-art MaxRe and RR algorithm. In addition, HRRM implements approximate minimum redundancy with a short computation time.