ERrOR: Improving Performance and Fault Tolerance using Early Execution

被引:1
|
作者
Choudhary, Raj Kumar [1 ]
Patel, Janeel [1 ]
Singh, Virendra [1 ]
机构
[1] Indian Inst Technol, Comp Architecture & Dependable Syst Lab, Mumbai, India
来源
2023 IEEE 29TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN, IOLTS | 2023年
关键词
reliability; soft errors; fault tolerance; instruction re-execution; CORE;
D O I
10.1109/IOLTS59296.2023.10224863
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Contemporary integrated circuits are becoming increasingly susceptible to soft errors due to single-event upsets, effectively decreasing the reliability of operation. In this paper, we propose the ERrOR microarchitecture, that detects soft errors in processor operation using temporal redundancy with minimal hardware overhead. Previous proposals have explored the idea of introducing an Early Execution Unit (EXU) at the processor frontend in order to expeditiously execute dynamic instructions with short dependency chains for performance improvement. However, we observe that the functional units in the EXU are idle for a significant fraction of the program execution duration. ERrOR leverages these inactive frontend functional units to re-execute dynamic instructions for the purpose of error detection. A lightweight verifier introduced at the backend makes use of idle resources for redundant execution by interleaving program execution with re-execution for error detection. ERrOR provides exhaustive transient fault coverage while improving performance by 7.5% over an existing restricted OoO microarchitecture, Freeflow Core.
引用
收藏
页数:3
相关论文
共 50 条
  • [31] A modified learning algorithm for improving the fault tolerance of BP networks
    Wei, NH
    Yang, SY
    Tong, SB
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 247 - 252
  • [32] A PERFORMANCE ANALYSIS OF A BUDDY SYSTEM FOR FAULT TOLERANCE
    FINKEL, D
    TRIPATHI, SK
    PERFORMANCE EVALUATION, 1990, 11 (03) : 177 - 185
  • [33] Improving Fault Tolerance in High-Precision Clock Synchronization
    Gaderer, Georg
    Loschmidt, Patrick
    Sauter, Thilo
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2010, 6 (02) : 206 - 215
  • [34] Guaranteeing Fault Tolerance in Real Time Systems under Error Bursts
    Thomas, Jebin V.
    Ranjith, R.
    Pillay, Radhamani V.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1480 - 1484
  • [35] Software Implemented Fault Detection And Fault Tolerance Mechanisms - PART II: Experimental evaluation of error
    Gawkowski, Piotr
    Sosnowski, Janusz
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2005, 51 (03) : 495 - 508
  • [36] Fault Tolerance Approach To Improve Performance Computation Of Biological Jobs Using Cloud Computing
    Padmakumari, P.
    Umamakeswari, A.
    RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (02): : 417 - 422
  • [37] Deploying Throwboxes to Enhance Fault-Tolerance Performance in Delay Tolerant Networks
    Han, Wenlin
    Xiao, Yang
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 99 (03) : 1247 - 1278
  • [38] Deploying Throwboxes to Enhance Fault-Tolerance Performance in Delay Tolerant Networks
    Wenlin Han
    Yang Xiao
    Wireless Personal Communications, 2018, 99 : 1247 - 1278
  • [39] Evaluation of Fault Tolerance in Cloud Computing using Colored Petri Nets
    Effatparvar, Mehdi
    Madani, Seyedeh Solmaz
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (07) : 340 - 346
  • [40] Fault Tolerance in Network on Chip using Bypass Path establishing Packets
    Priya, Sharma
    Agarwal, Sukarn
    Kapoor, Hemangee K.
    2018 31ST INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2018 17TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID & ES), 2018, : 457 - 458