摘要

In a distributed storage system, the performance of storage nodes is essential. Data de-duplication, a data reduction technology, can improve the efficiency of storage systems by means of removing redundant data in storage nodes, but it also raises some issues that reduce the performance of storage nodes, such as massive metadata management and frequent disk I/O operations. This paper focuses on the organization and management of metadata of storage nodes with data de-duplication, presents a hash-based metadata organization method with elimination mechanism, and gives an container allocation algorithm based on SISL (Stream-Informed Segment Layout). Experiments and performance analysis have been performed. The results show that the methods mentioned above can significantly improve the memory utilization of storage nodes, reduce disk I/Os, but also improve the overall efficiency and performance of storage systems.

全文