A data replication algorithm for groups of files in data grids

作者:Azari Leila; Rahmani Amir Masoud; Daniel Helder A; Qader Nooruldeen Nasih
来源:Journal of Parallel and Distributed Computing, 2018, 113: 115-126.
DOI:10.1016/j.jpdc.2017.10.008

摘要

Data grid is emerging as the main part of the infrastructure for large-scale data intensive applications such as high energy physics and bioinformatics. The deployment of such infrastructures has allowed users of a grid site to gain access to a large amount of distributed data. Data replication is a key issue in a data grid and could be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. In this paper, we introduce a new dynamic data replication algorithm named Popular Groups of Files Replication (PGFR). Our proposed algorithm is based on an assumption: users in a Virtual Organization have similar interests in groups of files. Based on this assumption, and file access history, PGFR builds a connectivity graph to recognize a group of dependent files in each grid site and replicates the most Popular Groups of Files to each grid site, thus increasing the local availability. We used OptorSim simulator to evaluate the efficiency of PGFR algorithm. The simulation results show that PGFR achieves better performance compared to the existing algorithm; PGFR minimized the mean job execution time, bandwidth consumption, and avoiding unnecessary replication.

  • 出版日期2018-3