摘要

Nowadays, with the development of cloud computing technology, an increasing number of enterprises or organizations have migrated applications to the cloud environment. Because of privacy concerns, a company may store sensitive data on a local server or a private cloud. As a result, the data analytics tasks have to be performed only in the local environment even when the analysis results will be shared to the outside. In this paper, we design a decentralized workflow system to speed up the execution of a Privacy-Awareness Data Analytics application (PADA). Specifically, a workflow system with actor layer and engine layer for distributed execution of PADA proposed. The actors are responsible for performing actual data analysis tasks on the data side while the engines are deployed into different regions of the public cloud and communicate with actors to execute the whole workflow. Since engines in different regions have different communication latencies when invoking an actor, the optimization engine deployment scheme should be found to minimize the total execution time of a PADA. A path search-based heuristic algorithm is designed to select suitable cloud regions to place the engines in the public cloud. The experimental results indicate that the proposed algorithm is effective in reducing application makespan.