摘要
Running a dataflow in a cloud environment usually generates many useful intermediate datasets. A strategy for running a dataflow is to decide which datasets should be stored, while the rest of them are regenerated. The intermediate dataset storage (IDS) problem asks to find a strategy for running a dataflow, such that the total cost is minimized. The current best algorithm for linear-structure IDS takes O(n(4)) time, where "linear-structure" means that the structure of the datasets in the dataflow is a pipeline. In this paper, we present a new algorithm for this problem, and improve the time complexity to O(n(3)), where n is the number of datasets in the pipeline.
- 出版日期2015
- 单位山东大学