摘要

Scientific workflows have become the primary mechanism for conducting analyses on distributed computing infrastructures such as grids and clouds. In recent years, the focus of optimization within scientific workflows has primarily been on computational tasks and workflow makespan. However, as workflow-based analysis becomes ever more data intensive, data optimization is becoming a prime concern. Moreover, scientific workflows can scale along several dimensions: (i) number of computational tasks, (ii) heterogeneity of computational resources, and the (iii) size and type (static versus streamed) of data involved. Adapting workflow structure in response to these scalability challenges remains an important research objective. Understanding how a workflow graph can be restructured in an automated manner (through task merge, for instance), to address constraints of a particular execution environment is explored in this work, using a multi-objective evolutionary approach. Our approach attempts to adapt the workflow structure to achieve both compute and data optimization. The question of when to terminate the evolutionary search in order to conserve computations is tackled with a novel termination criterion. The results presented in this article demonstrate the feasibility of the termination criterion and demonstrate that significant optimization can be achieved with a multi-objective approach.

  • 出版日期2013-4