Soft Error Effects on Arm Microprocessors: Early Estimations versus Chip Measurements

被引:35
作者
Bodmann, Pablo R. [1 ]
Papadimitriou, George [2 ]
Rech Junior, Rubens L. [1 ]
Gizopoulos, Dimitris [2 ]
Rech, Paolo [3 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, PPGC, BR-90040060 Porto Alegre, RS, Brazil
[2] Univ Athens, Athens 15772, Greece
[3] Politecn Torino, Dept Control & Comp Engn, I-10129 Turin, Italy
基金
欧盟地平线“2020”;
关键词
Reliability; Microarchitecture; Central Processing Unit; Hardware; Circuit faults; Reliability engineering; Error analysis; CPU reliability; soft errors; failures in time; neutron beam; microarchitecture-level fault injection; performance models; RELIABILITY; SYSTEM;
D O I
10.1109/TC.2021.3128501
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The importance and challenges associated with a timely, but yet realistic reliability evaluation grow with the increase of complexity in both the hardware domain, with the integration of different types of cores in an SoC (System-on-Chip), and the software domain, with the OS (operating system) required to take full advantage of the available resources. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 CPU and an Arm Cortex-A9 CPU integrated into an SoC and evaluate their reliability in bare-metal and Linux-based configurations. Combining experimental data that covers more than 18 million years of device time with the result of more than 176,000 injections we find that both the SoC integration and the presence of the OS increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.
引用
收藏
页码:2358 / 2369
页数:12
相关论文
共 44 条
[1]   Radiation-induced soft errors in advanced semiconductor technologies [J].
Baumann, RC .
IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2005, 5 (03) :305-316
[2]  
Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718
[3]  
Blome J., 2005, P 1 WORKSH ARCH REL
[4]   Progress of the Scientific Commissioning of a fast neutron beamline for Chip Irradiation [J].
Cazzaniga, Carlo ;
Frost, Christopher D. .
22ND MEETING OF THE INTERNATIONAL COLLABORATION ON ADVANCED NEUTRON SOURCES (ICANS XXII), 2018, 1021
[5]   Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments [J].
Chatzidimitriou, Athanasios ;
Bodmann, Pablo ;
Papadimitriou, George ;
Gizopoulos, Dimitris ;
Rech, Paolo .
2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2019), 2019, :26-38
[6]   RT Level vs. Microarchitecture-Level Reliability Assessment: Case Study on ARM® Cortex®-A9 CPU [J].
Chatzidimitriou, Athanasios ;
Kaliorakis, Manolis ;
Gizopoulos, Dimitris ;
Iacaruso, Maurizio ;
Pipponzi, Mauro ;
Mariani, Riccardo ;
Di Carlo, Stefano .
2017 47TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W 2017), 2017, :117-120
[7]  
Chatzidimitriou A, 2017, IEEE VLSI TEST SYMP
[8]  
Chatzidimitriou A, 2016, INT SYM PERFORM ANAL, P69, DOI 10.1109/ISPASS.2016.7482075
[9]  
Cho H, 2013, DES AUT CON
[10]  
Cohen A., 2018, Inter-Disciplinary Research Challenges in Computer Systems for the 2020s