Silent Data Corruptions: Microarchitectural Perspectives

被引:15
作者
Papadimitriou, George [1 ]
Gizopoulos, Dimitris [1 ]
机构
[1] Natl & Kapodistrian Univ Athens, Dept Informat & Telecommun, Athens 15784, Greece
关键词
Hardware; Circuit faults; Microarchitecture; Software; Redundancy; Error correction codes; Computer bugs; Silent data corruptions; faults; errors; microarchitecture; microprocessor; fault injection; ARCHITECTURAL VULNERABILITY FACTOR; ERROR; PROPAGATION;
D O I
10.1109/TC.2023.3285094
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the major challenge of silent data corruptions (SDCs) and aim on solutions to minimize its impact by avoiding, detecting, and mitigating SDCs. Recent studies on large scale datacenters conducted by Meta and Google report an unexpected rate of silent data corruption incidents that are attributed to modern microprocessor generations. Despite the acknowledged severity of the phenomenon, particularly at the datacenter scale, there is no in-depth analysis of the microarchitectural locations in a complex microprocessor that are more likely to generate an SDC at the program outputs. In this paper, we present a detailed analysis of the faulty behavior of many critical microarchitectural structures of a modern out-of-order microprocessor generating silent data corruptions. Our analysis unveils several observations, including: (i) the magnitude of silent data corruptions attributed to different hardware structures, (ii) the instruction-related parameters that are more likely to result in a silent data corruption, (iii) the extent to which the operating system affects the silent data corruption occurrences, and (iv) the byte positions of a word which are more likely to result in silent data corruptions. Collectively, such findings can assist decisions for hardware and software schemes for the reduction of the likelihood of silent data corruptions generation.
引用
收藏
页码:3072 / 3085
页数:14
相关论文
共 45 条
  • [11] SDC is in the Eye of the Beholder: A Survey and Preliminary Study
    Fang, Bo
    Wu, Panruo
    Guan, Qiang
    DeBardeleben, Nathan
    Monroe, Laura
    Blanchard, Sean
    Chen, Zhizong
    Pattabiraman, Karthik
    Ripeanu, Matei
    [J]. 2016 46TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2016, : 72 - 76
  • [12] Transient Fault Models and AVF Estimation Revisited
    George, Nishant J.
    Elks, Carl R.
    Johnson, Barry W.
    Lach, John
    [J]. 2010 IEEE-IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS DSN, 2010, : 477 - 486
  • [13] Guan Qiang, 2015, 5 WORKSH FAULT TOL H, P35
  • [14] MiBench: A free, commercially representative embedded benchmark suite
    Guthaus, MR
    Ringenberg, JS
    Ernst, D
    Austin, TM
    Mudge, T
    Brown, RB
    [J]. WWC-4: IEEE INTERNATIONAL WORKSHOP ON WORKLOAD CHARACTERIZATION, 2001, : 3 - 14
  • [15] ERROR DETECTING AND ERROR CORRECTING CODES
    HAMMING, RW
    [J]. BELL SYSTEM TECHNICAL JOURNAL, 1950, 29 (02): : 147 - 160
  • [16] Hari SKS, 2012, I C DEPEND SYS NETWO
  • [17] Hochschild P. H., 2021, P WORKSH HOT TOP OP, P9, DOI [10.1145/3458336.3465297, DOI 10.1145/3458336.3465297]
  • [18] Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule
    Ibe, Eishi
    Taniguchi, Hitoshi
    Yahagi, Yasuo
    Shimbo, Ken-ichi
    Toba, Tadanobu
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2010, 57 (07) : 1527 - 1538
  • [19] Δ-encoding: Practical Encoded Processing
    Kuvaiskii, Dmitrii
    Fetzer, Christof
    [J]. 2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, 2015, : 13 - 24
  • [20] Leveugle R, 2009, DES AUT TEST EUROPE, P502