摘要

The data-intensive scientific applications running on high-end computing system depend on parallel file systems for high-speed data input/output. In most parallel file systems, a file is partitioned into multiple subfiles with a view to allowing it to be accessed concurrently. An important factor in the file partition is the stripe size. However, while working well for certain applications, most existing schemes for determining the stripe size for a file still lack the ability to handle highly concurrent data accesses, which is typical for most parallel scientific applications. To address this problem, this paper presents an analytic model to assess the performance of highly concurrent data accesses at first, and then it describes how to apply this model to select the stripe size of a file. Experimental results demonstrate that the accuracy of the analytic model is around 87.89% and the stripe size selected with it can improve the aggregated I/O bandwidth of FLASH I/O up to 5.8 times compared with well-known methods. This paper also discusses how to incorporate our method into real-world parallel file systems.

全文