Algorithm-Based Recovery for HPL

Davies Teresa<sup>*</sup>; Chen Zizhong; Karlsson Christer; Liu Hui

doi:10.1145/2038037.1941600

登录

免费注册

赞收藏引用

科研之友

微信

新浪微博

Facebook

分享链接

Algorithm-Based Recovery for HPL

作者：Davies Teresa^*; Chen Zizhong; Karlsson Christer; Liu Hui

来源：ACM Sigplan Notices, 2011, 46(8): 303-304.

DOI：10.1145/2038037.1941600

摘要

When more processors are used for a calculation, the probability that one will fail during the calculation increases. Fault tolerance is a technique for allowing a calculation to survive a failure, and includes recovering lost data. A common method of recovery is diskless checkpointing. However, it has high overhead when a large amount of data is involved, as is the case with matrix operations. A checksum-based method allows fault tolerance of matrix operations with lower overhead. This technique is applicable to the LU decomposition in the benchmark HPL.

出版日期2011-8

全文

访问全文

收藏分享被引浏览

更新时间：2018-02-10 00:03

相似论文
引用论文
参考文献

产品服务

科研之友科研之友机构版科创云

站内浏览

科研成果科研人员科研机构

服务支持

帮助中心隐私政策服务条款

联系方式

在线客服：【立即咨询】客户热线：400-1616-289 电子邮箱：support@scholarmate.com

微信公众号