A New Algorithm for Intermediate Dataset Storage in a Cloud-Based Dataflow

作者:Cheng Jie; Zhu Daming*; Zhu Binhai
来源:9th International Frontiers of Algorithmics Workshop (FAW), 2015-07-03 to 2015-07-05.
DOI:10.1007/978-3-319-19647-3_4

摘要

Running a dataflow in a cloud environment usually generates many useful intermediate datasets. A strategy for running a dataflow is to decide which datasets should be stored, while the rest of them are regenerated. The intermediate dataset storage (IDS) problem asks to find a strategy for running a dataflow, such that the total cost is minimized. The current best algorithm for linear-structure IDS takes O(n(4)) time, where "linear-structure" means that the structure of the datasets in the dataflow is a pipeline. In this paper, we present a new algorithm for this problem, and improve the time complexity to O(n(3)), where n is the number of datasets in the pipeline.

全文