Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems

作者:Chen Hsing Min*; Jeloka Supreet; Arunkumar Akhil; Blaauw David; Wu Carole Jean; Mudge Trevor; Chakrabarti Chaitali
来源:IEEE Transactions on Computers, 2016, 65(12): 3766-3779.
DOI:10.1109/TC.2016.2550455

摘要

Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. Synthesis results in 28 nm node show that the decoding latency of these codes is negligible compared to the DRAM access latency. In addition, we make use of erasure codes to extend the lifetime of the DRAM systems. Specifically, once a chip is marked faulty due to persistent errors, all E-ECC schemes correct erasures due to that faulty chip and also correct an additional random error in a second chip. Evaluation with SPEC2006 workloads show that compared to x4 Chipkill-Correct schemes, Scheme 5 has the highest IPC improvement (mean of 7 percent) and Scheme 4 has the largest power reduction (mean of 18 percent) and the largest increase in energy efficiency (mean of 25 percent).

  • 出版日期2016-12-1

全文