Lockstep Replacement: Fault-Tolerant Design

被引:0
作者
Mach, Jan [1 ]
Kohutka, Lukas [1 ,2 ]
机构
[1] Slovak Univ Technol Bratislava, Inst Informat Informat Syst & Software Engn, Bratislava 81107, Slovakia
[2] DrAS Sro, Bratislava 85101, Slovakia
关键词
Circuit faults; Protection; Registers; Software; Pipelines; Logic gates; Hardware; Fault tolerant systems; Fault tolerance; Microarchitecture; Safety; reliability; space; automotive; CPU; RISC-V;
D O I
10.1109/ACCESS.2025.3573684
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
System-level lockstep, interconnecting two original cores, is nowadays the state-of-the-art approach for protecting processor systems against random hardware faults. However, the lack of information outside the cores necessitates many compromises in dependability, software complexity, power consumption, performance, and system cost. We propose a redundancy-based protection scheme integrated into the microarchitecture of the core. The scheme separates the execution pipeline into sections with fault detection and fault tolerance, while the large predictor components remain unprotected. This approach results in fault-tolerant hardware with only an 8% area penalty compared to the dual-core lockstep, which provides only fault detection. Since the hardware recovers from the faults automatically, it does not necessitate software support and additional memory for checkpoints. The scheme comprises independent protection approaches, resulting in negligible frequency impact. It also provides an interface and memory protection and is designed to be applicable in most of the embedded-class cores. Simulation-based fault injection campaigns were performed, considering physical synthesis data, to assess the fault-tolerance. We also analyze how software compilation affects dependability and provide a hardware solution to mitigate the undesired effects.
引用
收藏
页码:94302 / 94318
页数:17
相关论文
共 39 条
[1]  
[Anonymous], 2023, Cortex M23 Dual Core Lockstep Application Note, ID 107936
[2]  
[Anonymous], 2021, J7200 DRA821 Processor, Silicon Revision 1.0 Texas Instruments, Technical Reference Manual
[3]  
[Anonymous], 2016, Standard ECSS-Q-HB-60-02A
[4]  
ARM Limited, 2006, IHI 0033A, AMBA 3 AHB-Lite Protocol V1.0, Specification
[5]  
Arm Limited, 2022, Arm Cortex-A76AE Core Technical Reference Manual
[6]  
Revision: R1P1
[7]  
Atmel Corporation, 2011, Rev. 7703E-AERO-08/11
[8]   Evaluation of Dynamic Triple Modular Redundancy in an Interleaved-Multi-Threading RISC-V Core [J].
Barbirotta, Marcello ;
Cheikh, Abdallah ;
Mastrandrea, Antonio ;
Menichelli, Francesco ;
Ottavi, Marco ;
Olivieri, Mauro .
JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2023, 13 (01)
[9]  
Battezzati N, 2011, RECONFIGURABLE FIELD PROGRAMMABLE GATE ARRAYS FOR MISSION-CRITICAL APPLICATIONS, P1, DOI 10.1007/978-1-4419-7595-9
[10]   Soft errors in advanced computer systems [J].
Baumann, R .
IEEE DESIGN & TEST OF COMPUTERS, 2005, 22 (03) :258-266