Accessing data from many servers simultaneously and adaptively in data grids

作者:Chang Ruay Shiung*; Lin Chun Fu; Hsi Shih Chun
来源:Future Generation Computer Systems, 2010, 26(1): 63-71.
DOI:10.1016/j.future.2009.07.005

摘要

In data grid environments, data replication services increase the reliability and the availability of the data sets. Using replicas, a data set can be downloaded from many servers simultaneously in order to improve the performance of the data transfer. However, the servers selected may become slow or congested. Therefore, it should be replaced dynamically.
In this paper, we propose a method to improve the efficiency of downloading by reevaluating the conditions of all servers that can supply the dataset during the download progress. Current servers used for downloading, if perform unsatisfactorily, will be replaced by others that meet the performance criteria. In all previous schemes in parallel downloading, the load of an ill performing server will be just transferred to a more powerful server. However, intuitively if there are idle servers around that also have this data, why not use them? Therefore, our method will monitor all servers that can supply the data even if a server is not involved in the download process initially. Loads may be transferred from a working server to a non-working server, not just transferred to another working server. Furthermore, to decide the suitability of a server, we consider not only the connection bandwidth but also the status within a server such as CPU and memory usage.
We implement our method in a real grid environment. Our method decreases the completion time by 1.63%-13.45% in the real grid environment and by 6.28%-30.56% in the grid environment with other injected load when compared to the recursive co-allocation scheme. It shows that the proposed scheme adapts to the dynamic environment nicely and decreases the total download time effectively.