Combining Architectural Fault-injection and Neutron Beam Testing Approaches Toward Better Understanding of GPU Soft-error Resilience

被引:0
|
作者
Previlon, Fritz G. [1 ]
Egbantan, Babatunde [1 ]
Tiwari, Devesh [1 ]
Rech, Paolo [2 ]
Kaeli, David. R. [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
[2] Univ Fed Rio Grande do Sul, Porto Alegre, RS, Brazil
来源
2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS) | 2017年
关键词
RADIATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transient faults continue to be a critical concern in a range of computing domains including: High-Performance Computing (HPC), scientific computing, and the automotive industry. While radiation-induced faults have been well studied and understood in microprocessors, their impact on computations on Graphic Processing Units (GPU) has received less attention. GPUs are now being used in a large number of HPC and automotive markets. Mitigating the effects of transient faults requires a thorough understanding of the interaction between applications, system software, and the underlying hardware. Developing this understanding is quite challenging mainly due to our limited ability to capture and study cross-layer reliability interactions. In this paper, we consider the combination of neutron beam testing experiments with architectural fault injection experiments to gain a deeper understanding of the relationship between the vulnerability of GPUs and the underlying workload characteristics of applications targeted for GPU devices.
引用
收藏
页码:898 / 901
页数:4
相关论文
共 2 条
  • [1] A Program-Aware Fault-Injection Method for Dependability Evaluation Against Soft-Error Using Genetic Algorithm
    Arasteh, Bahman
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2018, 27 (09)
  • [2] Demystifying Soft Error Assessment Strategies on ARM CPUs: Microarchitectural Fault Injection vs. Neutron Beam Experiments
    Chatzidimitriou, Athanasios
    Bodmann, Pablo
    Papadimitriou, George
    Gizopoulos, Dimitris
    Rech, Paolo
    2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2019), 2019, : 26 - 38