ERrOR: Improving Performance and Fault Tolerance using Early Execution

被引:1
|
作者
Choudhary, Raj Kumar [1 ]
Patel, Janeel [1 ]
Singh, Virendra [1 ]
机构
[1] Indian Inst Technol, Comp Architecture & Dependable Syst Lab, Mumbai, India
来源
2023 IEEE 29TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN, IOLTS | 2023年
关键词
reliability; soft errors; fault tolerance; instruction re-execution; CORE;
D O I
10.1109/IOLTS59296.2023.10224863
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Contemporary integrated circuits are becoming increasingly susceptible to soft errors due to single-event upsets, effectively decreasing the reliability of operation. In this paper, we propose the ERrOR microarchitecture, that detects soft errors in processor operation using temporal redundancy with minimal hardware overhead. Previous proposals have explored the idea of introducing an Early Execution Unit (EXU) at the processor frontend in order to expeditiously execute dynamic instructions with short dependency chains for performance improvement. However, we observe that the functional units in the EXU are idle for a significant fraction of the program execution duration. ERrOR leverages these inactive frontend functional units to re-execute dynamic instructions for the purpose of error detection. A lightweight verifier introduced at the backend makes use of idle resources for redundant execution by interleaving program execution with re-execution for error detection. ERrOR provides exhaustive transient fault coverage while improving performance by 7.5% over an existing restricted OoO microarchitecture, Freeflow Core.
引用
收藏
页数:3
相关论文
共 50 条
  • [21] Trading Fault Tolerance for Performance in AN Encoding
    Rink, Norman A.
    Castrillon, Jeronimo
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 183 - 190
  • [22] Rethinking rail track switches for fault tolerance and enhanced performance
    Bemment, Samuel D.
    Ebinger, Emma
    Goodall, Roger M.
    Ward, Christopher P.
    Dixon, Roger
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART F-JOURNAL OF RAIL AND RAPID TRANSIT, 2017, 231 (09) : 1048 - 1065
  • [23] Rethinking Rail Track Switches for Fault Tolerance and Enhanced Performance
    Harrison, Tim
    Bemment, Samuel D.
    Ebinger, Emma
    Goodall, Roger M.
    Ward, Christopher P.
    Dixon, Roger
    IFAC PAPERSONLINE, 2016, 49 (21): : 260 - 266
  • [24] Improving TTCN-3 Test System Robustness Using Software Fault Tolerance
    Perala, Juho
    2009 FIRST INTERNATIONAL CONFERENCE ON ADVANCES IN SYSTEM TESTING AND VALIDATION LIFECYCLE, 2009, : 48 - 56
  • [25] Improving the fault-tolerance of software-defined networks with dynamic overlay agreement
    Hsieh, Hui-Ching
    Chiang, Mao-Lun
    Chang, Tzu-Yang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (03): : 2597 - 2614
  • [26] A Method for Improving the Attitude Calculation's Fault Tolerance Rate by Using Hardware plus Software
    Zhai, Hong
    Li, Xing-Qian
    Chen, Ling-Yu
    Zhu, Yun-Long
    Zhao, Hong-Wei
    PROCEEDINGS OF THE 3RD ANNUAL INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND INFORMATION SCIENCE (EEEIS 2017), 2017, 131 : 467 - 472
  • [27] Improving the fault-tolerance of software-defined networks with dynamic overlay agreement
    Hui-Ching Hsieh
    Mao-Lun Chiang
    Tzu-Yang Chang
    Cluster Computing, 2021, 24 : 2597 - 2614
  • [28] A decentralized fault tolerance model based on level of performance for grid environment
    Rebbah, Mohammed
    Slimani, Yahya
    Benyettou, Abdelkader
    Brunie, Lionel
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (01): : 13 - 27
  • [29] High performance fault-tolerance for clouds
    Kyriazis, Dimosthenis
    Anagnostopoulos, Vasileios
    Arcangeli, Andrea
    Gilbert, David
    Kalogeras, Dimitrios
    Kat, Ronen
    Klein, Cristian
    Kokkinos, Panagiotis
    Kuperman, Yossi
    Nider, Joel
    Svard, Petter
    Tomas, Luis
    Varvarigos, Emmanuel
    Varvarigou, Theodora
    2015 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2015, : 251 - 257
  • [30] Fault Tolerance Through Redundant Execution on COTS Multicores: Exploring Trade-offs
    Shen, Yanyan
    Heiser, Gernot
    Elphinstone, Kevin
    2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2019), 2019, : 188 - 200