Silent Data Corruptions: Microarchitectural Perspectives

被引:15
作者
Papadimitriou, George [1 ]
Gizopoulos, Dimitris [1 ]
机构
[1] Natl & Kapodistrian Univ Athens, Dept Informat & Telecommun, Athens 15784, Greece
关键词
Hardware; Circuit faults; Microarchitecture; Software; Redundancy; Error correction codes; Computer bugs; Silent data corruptions; faults; errors; microarchitecture; microprocessor; fault injection; ARCHITECTURAL VULNERABILITY FACTOR; ERROR; PROPAGATION;
D O I
10.1109/TC.2023.3285094
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the major challenge of silent data corruptions (SDCs) and aim on solutions to minimize its impact by avoiding, detecting, and mitigating SDCs. Recent studies on large scale datacenters conducted by Meta and Google report an unexpected rate of silent data corruption incidents that are attributed to modern microprocessor generations. Despite the acknowledged severity of the phenomenon, particularly at the datacenter scale, there is no in-depth analysis of the microarchitectural locations in a complex microprocessor that are more likely to generate an SDC at the program outputs. In this paper, we present a detailed analysis of the faulty behavior of many critical microarchitectural structures of a modern out-of-order microprocessor generating silent data corruptions. Our analysis unveils several observations, including: (i) the magnitude of silent data corruptions attributed to different hardware structures, (ii) the instruction-related parameters that are more likely to result in a silent data corruption, (iii) the extent to which the operating system affects the silent data corruption occurrences, and (iv) the byte positions of a word which are more likely to result in silent data corruptions. Collectively, such findings can assist decisions for hardware and software schemes for the reduction of the likelihood of silent data corruptions generation.
引用
收藏
页码:3072 / 3085
页数:14
相关论文
共 45 条
  • [31] Nair Arun Arvind, 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P125, DOI 10.1109/MICRO.2010.34
  • [32] Anatomy of On-Chip Memory Hardware Fault Effects Across the Layers
    Papadimitriou, George
    Gizopoulos, Dimitris
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 420 - 431
  • [33] AVGI: Microarchitecture-Driven, Fast and Accurate Vulnerability Assessment
    Papadimitriou, George
    Gizopoulos, Dimitris
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 935 - 948
  • [34] Characterizing Soft Error Vulnerability of CPUs Across Compiler Optimizations and Microarchitectures
    Papadimitriou, George
    Gizopoulos, Dimitris
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2021), 2021, : 113 - 124
  • [35] Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers
    Papadimitriou, George
    Gizopoulos, Dimitris
    [J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 902 - 915
  • [36] Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs
    Papadimitriou, George
    Kaliorakis, Manolis
    Chatzidimitriou, Athanasios
    Gizopoulos, Dimitris
    Lawthers, Peter
    Das, Shidhartha
    [J]. 50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 503 - 516
  • [37] Adaptive Voltage/Frequency Scaling and Core Allocation for Balanced Energy and Performance on Multicore CPUs
    Papadimitriou, George
    Chatzidimitriou, Athanasios
    Gizopoulos, Dimitris
    [J]. 2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 133 - 146
  • [38] Papadimitriou G, 2016, PR IEEE COMP DESIGN, P544, DOI 10.1109/ICCD.2016.7753339
  • [39] Light-Weight Techniques for Improving the Controllability and Efficiency of ISA-Level Fault Injection Tools
    Sangchoolie, Behrooz
    Johansson, Roger
    Karlsson, Johan
    [J]. 2017 IEEE 22ND PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2017), 2017, : 68 - 77
  • [40] Sridharan V, 2010, CONF PROC INT SYMP C, P461, DOI 10.1145/1816038.1816023