Accurate Model for Application Failure Due to Transient Faults in Caches

作者:Manoochehri Mehrtash*; Dubois Michel
来源:IEEE Transactions on Computers, 2016, 65(8): 2397-2410.
DOI:10.1109/TC.2015.2488642

摘要

To select an appropriate level of error protection in caches, the impact of various protection schemes on the cache Failure In Time (FIT) rate must be evaluated for a target benchmark suite. However, while many simulation tools exist to evaluate area, power and performance for a set of benchmark programs, there is a dearth of such tools for reliability. This paper introduces a new cache reliability model called PARMA+ that has unique features which distinguish it from previous models. PARMA+ estimates a cache's FIT rate in the presence of spatial multi-bit faults, single-bit faults, temporal multi-bit faults and different error protection schemes including parity, ECC, early write-back and bit-interleaving. We first develop the model formally, then we demonstrate its accuracy. We have run reliability simulations for many distributions of large and small fault patterns and have compared them with accelerated fault injection simulations. PARMA+ has high accuracy and low computational complexity.

  • 出版日期2016-8-1