A data-check based distributed storage model for storing hot temporary data

作者:Li, Jianjiang; Zhang, Peng; Li, Yuance; Chen, Wei; Liu, Yajun; Wang, Lizhe*
来源:Future Generation Computer Systems-The International Journal of eScience, 2017, 73: 13-21.
DOI:10.1016/j.future.2017.03.019

摘要

For the purpose of ensuring data security, traditional systems have widely used redundancy backup to store multiple copies of data. Multiple copies technology has high reliability, but also has the disadvantage of high redundancy storage and low space utilization. On the contrary, EC (Erasure Coding) technology has a high utilization rate of storage space, but the overhead of coding, decoding and data reconstruction is great. So, this paper demonstrates a data backup method based on XOR checksum being suitable for storing hot temporary data, which first splits the data into two parts and then performs the XOR operation of the two parts to generate another part of the data. Finally, the XOR checksum stores the three data parts into different nodes. The checksum not only ensures the security of data but also saves the storage space, thus improving the performance of reading and writing. This strategy achieves a mutual backup between the three nodes in order to ensure data security. Because there is only one copy of original data in the system, this model resolves the data inconsistency problem reasonably and simplifies the data version control existing in the redundancy backup model. Actual data test results show that, compared with the current mainstream Cassandra redundant backup model, the performance of a data backup model based on the XOR checksum proposed and implemented in this paper has been significantly improved: the reading performance improves by an average of 10%, and the writing performance improves by an average of 30%.