Accelerating R-based analytics on the cloud

作者:Patel Ishan; Rau Chaplin Andrew; Varghese Blesson*
来源:Concurrency and Computation-Practice & Experience, 2016, 28(4): 977-994.
DOI:10.1002/cpe.3026

摘要

This paper addresses how the benefits of cloud-based infrastructure can be harnessed for analytical workloads. Often, the software handling analytical workloads is not developed by a professional programmer but on an ad hoc basis by analysts in high-level programming environments such as R or MATLAB. The goal of this research is to allow Analysts to take an analytical job that executes on their personal workstations and with minimum effort execute it on cloud infrastructure and manage both the resources and the data required by the job. If this can be facilitated gracefully, then the Analyst benefits from on-demand resources, low maintenance cost and scalability of computing resources, all of which are offered by the cloud. In this paper, a Platform for Parallel R-based Analytics on the Cloud (P2RAC) that is placed between an Analyst and a cloud infrastructure is proposed and implemented. P2RAC offers a set of command-line tools for managing the resources, such as instances and clusters, the data and the execution of the software on the Amazon Elastic Computing Cloud infrastructure. Experimental studies are pursued using two parallel problems and the results obtained confirm the feasibility of employing P2RAC for solving large-scale analytical problems on the cloud.

  • 出版日期2016-3-25