摘要

Energy is one of the primary design constraints in heterogeneous distributed systems ranging from small embedded devices to large-scale data centers, where a parallel application with precedence-constrained tasks is represented by a directed acyclic graph (DAG). Dynamic voltage and frequency scaling (DVFS) has become an important energy control technology by simultaneously scaling down processor's supply voltage and frequency while tasks are running. However, recent studies show that dynamically scaling down the chip's voltage may lead to a sharp rise in transient failures of processors, thereby affecting the reliability of the system. This study solves the problem of maximizing reliability of an energy constrained parallel application on heterogeneous distributed systems based on DVFS. The problem is decomposed into two sub -problems, namely, satisfying energy constraint and maximizing reliability. The first sub-problem is solved by transferring the energy constraint of the application to that of each task, and the second subproblem is solved by heuristically scheduling each task with maximum reliability value while satisfying its energy constraint. Experiments with real parallel applications show that the proposed MREC algorithm can obtain larger reliability values than the state-of-the-art reliability maximum energy conservation (RMEC) algorithm while satisfying the energy constraints.