ERrOR: Improving Performance and Fault Tolerance using Early Execution

被引:1
|
作者
Choudhary, Raj Kumar [1 ]
Patel, Janeel [1 ]
Singh, Virendra [1 ]
机构
[1] Indian Inst Technol, Comp Architecture & Dependable Syst Lab, Mumbai, India
来源
2023 IEEE 29TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN, IOLTS | 2023年
关键词
reliability; soft errors; fault tolerance; instruction re-execution; CORE;
D O I
10.1109/IOLTS59296.2023.10224863
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Contemporary integrated circuits are becoming increasingly susceptible to soft errors due to single-event upsets, effectively decreasing the reliability of operation. In this paper, we propose the ERrOR microarchitecture, that detects soft errors in processor operation using temporal redundancy with minimal hardware overhead. Previous proposals have explored the idea of introducing an Early Execution Unit (EXU) at the processor frontend in order to expeditiously execute dynamic instructions with short dependency chains for performance improvement. However, we observe that the functional units in the EXU are idle for a significant fraction of the program execution duration. ERrOR leverages these inactive frontend functional units to re-execute dynamic instructions for the purpose of error detection. A lightweight verifier introduced at the backend makes use of idle resources for redundant execution by interleaving program execution with re-execution for error detection. ERrOR provides exhaustive transient fault coverage while improving performance by 7.5% over an existing restricted OoO microarchitecture, Freeflow Core.
引用
收藏
页数:3
相关论文
共 50 条
  • [41] Design and Verification of Fault Tolerance IP Core using SIHFT Technique
    Solanki, Sandeep
    Kaur, Manjit
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 860 - 863
  • [42] Evaluating reliability improvements of fault tolerant array processors using algorithm-based fault tolerance
    Tao, DL
    Kantawala, K
    IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (06) : 725 - 730
  • [43] Improving resource utilization and fault tolerance in large simulations via actors
    Klenk, Kyle
    Spiteri, Raymond J.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 6323 - 6340
  • [44] Improving the survivability of RESTful Web applications via declarative fault tolerance
    Edstrom, John
    Tilevich, Eli
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (12) : 3108 - 3125
  • [45] Improving SEU Fault Tolerance Capabilities of a Self-Converging Algorithm
    Velazco, Raoul
    Mansour, Wassim
    Pancher, Fabrice
    Marques-Costa, Greicy
    Sohier, Devan
    Bui, Alain
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2012, 59 (04) : 818 - 823
  • [46] Fault Tolerance in Distributed Database Management Systems - Improving reliability with RAID
    Pareek, Sumit
    Sharma, Nishant
    Mary, Geetha A.
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [47] Improving Fault Tolerance Utilizing Hardware-Software-Co-Synthesis
    Riener, Heinz
    Frehse, Stefan
    Fey, Goerschwin
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 939 - 942
  • [48] Improving byzantine fault tolerance based on stake evaluation and consistent hashing
    Wu, Guangfu
    Lai, Xin
    He, Daojing
    Chan, Sammy
    Fu, Xiaoyan
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2024, 17 (04) : 1963 - 1975
  • [49] Optimizing the Performance of Virtual Machine Synchronization for Fault Tolerance
    Zhu, Jun
    Jiang, Zhefu
    Xiao, Zhen
    Li, Xiaoming
    IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (12) : 1718 - 1729
  • [50] Stochastic node placement improving fault tolerance in wireless sensor networks
    Ishizuka, Mika
    Aida, Masaki
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART I-COMMUNICATIONS, 2007, 90 (03): : 42 - 53