On Providing Scalable Self-healing Adaptive Fault-tolerance to RTR SoCs

被引:0
|
作者
Navas, Byron [1 ,2 ]
Oberg, Johnny [1 ]
Sander, Ingo [1 ]
机构
[1] KTH Royal Inst Technol, Dept Elect Syst, Stockholm, Sweden
[2] ESPE Univ Fuerzas Armadas, Dept Elect & Elect Engn, Sangolqui, Ecuador
来源
2014 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG) | 2014年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The dependability of heterogeneous many-core FPGA based systems are threatened by higher failure rates caused by disruptive scales of integration, increased design complexity, and radiation sensitivity. Triple-modular redundancy (TMR) and run-time reconfiguration (RTR) are traditional faulttolerant (FT) techniques used to increase dependability. However, hardware redundancy is expensive and most approaches have poor scalability, flexibility, and programmability. Therefore, innovative solutions are needed to reduce the redundancy cost but still preserve acceptable levels of dependability. In this context, this paper presents the implementation of a self-healing adaptive fault-tolerant SoC that reuses RTR IP-cores in order to self-assemble different TMR schemes during run-time. The presented system demonstrates the feasibility of the Upset-Fault-Observer concept, which provides a run-time self-test and recovery strategy that delivers fault-tolerance over functions accelerated in RTR cores, at the same time reducing the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles. In addition, this paper experimentally evaluates the trade-off of the implemented reconfigurable TMR schemes by characterizing important fault tolerant metrics i.e., recovery time (self-repair and self-replicate), detection latency, self-assembly latency, throughput reduction, and increase of physical resources.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] ADAPTIVE FAULT TOLERANCE FOR SCALABLE CLUSTER COMPUTING IN SPACE
    James, Mark L.
    Shapiro, Andrew A.
    Springer, Paul L.
    Zima, Hans P.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2009, 23 (03): : 227 - 241
  • [42] A safety critical system model utilizing adaptive fault-tolerance and security
    Davis, FGF
    Gantenbein, RE
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 1996, : 138 - 141
  • [43] An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment
    Kim, SoonGohn
    Ko, EungNam
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2008, 15 : 302 - +
  • [44] Adaptive fault-tolerance with statically scheduled real-time systems
    Fohler, G
    NINTH EUROMICRO WORKSHOP ON REAL TIME SYSTEMS, PROCEEDINGS, 1997, : 161 - 167
  • [45] Towards Cognitive Reconfigurable Hardware: Self-Aware Learning in RTR Fault-Tolerant SoCs
    Navas, Byron
    Sander, Ingo
    Oberg, Johnny
    2015 10TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2015,
  • [46] An approach for adaptive fault-tolerance in object oriented open distributed systems
    Shokri, E
    Hecht, H
    Crane, P
    Dussault, J
    Kim, KHK
    THIRD INTERNATIONAL WORKSHOP ON OBJECT-ORIENTED REAL-TIME DEPENDABLE SYSTEMS, PROCEEDINGS, 1997, : 298 - 305
  • [47] Load Balancing and Fault-Tolerance for Scalable Network File Systems Using by Web Services
    Chang, Hsien-Tsung
    PROCEEDINGS OF THE 13TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS, 2009, : 351 - +
  • [48] Adaptive Fault Detection Scheme Using an Optimized Self-healing Ensemble Machine Learning Algorithm
    Yavuz, Levent
    Soran, Ahmet
    Onen, Ahmet
    Li, Xiangjun
    Muyeen, S. M.
    CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, 2022, 8 (04): : 1145 - 1156
  • [49] Fault diagnosis and self-healing for smart manufacturing: a review
    Aldrini, Joma
    Chihi, Ines
    Sidhom, Lilia
    JOURNAL OF INTELLIGENT MANUFACTURING, 2024, 35 (06) : 2441 - 2473
  • [50] Fault Management for Self-Healing in Ubiquitous Sensor Network
    Yoo, Gijong
    Jung, Jinsoo
    Lee, Eunseok
    2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 524 - 528