Error resilience of three GMRES implementations under fault injection

被引:3
作者
Morinigo, Jose A. [1 ]
Bustos, Andres [1 ]
Mayo-Garcia, Rafael [1 ]
机构
[1] CIEMAT, Dept Tecnol, Avda Complutense 40, Madrid 28040, Spain
关键词
Randomized SVD; Preconditioned GMRES; LLFI; Fault injection; Iterative solvers; LINEAR-SYSTEMS;
D O I
10.1007/s11227-021-04148-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The resilience behavior of three GMRES prototyped implementations (with Incomplete LU, Flexible and randomized-SVD-based preconditioners) has been analyzed with a soft errors injection approach. A low-level fault injector is inserted into the GMRES solvers, which randomly select locations in the program to inject the fault across multiple executions. This fault injection approach combines the configurability of high-level and the accuracy of low-level techniques at the same time, so the effect of faults may be closely emulated. In order to gather enough statistical data, a set of eighteen sparse matrix-based linear systems Ax = b has been solved with these GMRES implementations in the injection experiments and monitored. The results of this prototype-based fault injection suggest an improved error resilience behavior of the randomized-SVD-based preconditioned GMRES version in many of the analyzed matrices, which points out to its interest in supercomputing applications where silent errors are more prominent.
引用
收藏
页码:7158 / 7185
页数:28
相关论文
共 51 条
[1]  
[Anonymous], Suitesparse matrix collection
[2]  
[Anonymous], 2012, Tech. Rep. IMPACT-12-01
[3]  
[Anonymous], 2012, ARXIV12061390
[4]  
[Anonymous], 1997, RALTR97031
[5]   Pattern-based Modeling of Multiresilience Solutions for High-Performance Computing [J].
Ashraf, Rizwan A. ;
Hukerikar, Saurabh ;
Engelmann, Christian .
PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, :80-87
[6]  
Ayatolahi F, 2013, COMPUTER SAFETY RELI, V8153, DOI 10.1007/978-3-642-40793-2_24
[8]  
Bridges PG, 2012, LECT NOTES COMPUT SC, V7156, P241, DOI 10.1007/978-3-642-29740-3_28
[9]  
CALHOUN J, 2014, LECT NOTES COMPUTER, V8805
[10]   Towards a More Complete Understanding of SDC Propagation [J].
Calhoun, Jon ;
Snir, Marc ;
Olson, Luke N. ;
Gropp, William D. .
HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, :131-142