摘要

Obtaining good performance measurements is a critical first step in effectively modeling and predicting the performance of distributed file system. In this paper, we are interested in studying the performance factor selection methods of distributed file systems. A relevance and redundancy aware performance feature selection approach is proposed for selecting a subset of most informative performance factors. We measure the accuracy of performance prediction based our proposed approach, and analyze the accuracy of different performance metrics, namely bandwidth, throughput and latency. The result indicates that our approach could effectively remove irrelevant performance factors, eliminates redundant features and provides better accuracy in distributed file system.

全文