A Multi-Layer Software-Based Fault-Tolerance Approach for Heterogenous Multi-Core Systems

被引:0
作者
Mueller, S. [1 ]
Koal, T. [1 ]
Scharoba, S. [1 ]
Vierhaus, H. T. [1 ]
Schoelzel, M. [2 ,3 ]
机构
[1] Brandenburg Tech Univ Cottbus, Cottbus, Germany
[2] IHP, Frankfurt Oder Potsdam, Germany
[3] Univ Potsdam, Frankfurt Oder Potsdam, Germany
来源
2015 16TH LATIN-AMERICAN TEST SYMPOSIUM (LATS) | 2015年
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a software-based technique for building heterogeneous fault tolerant multi-core systems, which are able to handle temporary and permanent hardware faults autonomously in two system layers. The fault tolerance technique relies on a single concept for adapting the binary code of the user application to the current fault state of a single core. Thereby this scheme is used either for a local repair of each core or for a global repair. By the global repair, the task assigned to a faulty core may be rescheduled to another core that provides enough resources for the execution of the task. Thereby the local repair scheme is reused for the adaptation of the rescheduled task. It is shown that the reliability of a multi-core system can be improved significantly, when using the global repair together with the local repair instead of using the local repair only.
引用
收藏
页数:6
相关论文
共 24 条
  • [1] [Anonymous], 2014, PROC EUR TEST S
  • [2] A software methodology for detecting hardware faults in VLIW data paths
    Bolchini, C
    Salice, F
    [J]. 2001 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2001, : 170 - 175
  • [3] Bower FA, 2004, 2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, P51
  • [4] Introduction: Reliability Challenges in Nano-CMOS Design
    Cao, Yu
    Bose, Pradip
    Tschanz, Jim
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 2009, 26 (06): : 6 - 7
  • [5] CHANDRASEKAR K., INT C HIGH PERF COMP
  • [6] Frangiotti M, 1995, PROCEEDINGS OF THE EIGHTH INTERNATIONAL KANT CONGRESS, VOL II, PT 1, SECT 1-9, P207, DOI 10.1109/DFTVS.1995.476954
  • [7] Incorporating fault tolerance in superscalar processors
    Franklin, M
    [J]. 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 1996, : 301 - 306
  • [8] Transient-fault recovery for chip multiprocessors
    Gomaa, M
    Scarbrough, C
    Vijaykurnar, TN
    Pomeranz, I
    [J]. 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 98 - 109
  • [9] Thread Relocation: A Runtime Architecture for Tolerating Hard Errors in Chip Multiprocessors
    Khan, Omer
    Kundu, Sandip
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (05) : 651 - 665
  • [10] Detouring: Translating Software to Circumvent Hard Faults in Simple Cores
    Meixner, Albert
    Sorin, Daniel J.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC, 2008, : 80 - +