A Differentiated Caching Mechanism to Enable Primary Storage Deduplication in Clouds

作者:Wu, Huijun; Wang, Chen; Fu, Yinjin; Sakr, Sherif; Lu, Kai*; Zhu, Liming
来源:IEEE Transactions on Parallel and Distributed Systems, 2018, 29(6): 1202-1216.
DOI:10.1109/TPDS.2018.2790946

摘要

Existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post-processing deduplication to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services for the following two reasons: First, the temporal locality of duplicate data writes varies among primary storage workloads, which makes it challenging to efficiently allocate the inline cache space and achieve a good deduplication ratio. Second, the post-processing deduplication does not eliminate duplicate I/O operations that write to the same logical block address as it is performed after duplicate blocks have been written. A hybrid deduplication mechanism is promising to deal with these problems. Inline fingerprint caching is essential to achieving efficient hybrid deduplication. In this paper, we present a detailed analysis of the limitations of using existing caching algorithms in primary deduplication in the cloud. We reveal that existing caching algorithms either perform poorly or incur significant memory overhead in fingerprint cache management. To address this, we propose a novel fingerprint caching mechanism that estimates the temporal locality of duplicates in different data streams and prioritizes the cache allocation based on the estimation. We integrate the caching mechanism and build a hybrid deduplication system. Our experimental results show that the proposed mechanism provides significant improvement for both deduplication ratio and overhead reduction.