Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems

被引:21
|
作者
Bolchini, Cristiana [1 ]
Carminati, Matteo [1 ]
Miele, Antonio [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informat & Bioingn, I-20133 Milan, Italy
来源
JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS | 2013年 / 29卷 / 02期
关键词
Tunable fault tolerance; Adaptive systems; Multi-/many-core architectures;
D O I
10.1007/s10836-013-5367-y
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a novel approach to the design of multi-/many-core systems with an adaptive level of reliability. The approach defines a layer at the operating system level that achieves fault detection/tolerance/diagnosis properties by means of thread replication and re-execution mechanisms. The layer applies the most convenient hardening mechanism to achieve the desired trade-off between reliability and performance by adapting at run-time to the changes of the working scenario. The proposed strategy has been applied in a set of experimental sessions considering a real-world parallel application, to evaluate its benefits on the final system with respect to various strategies selected at design time.
引用
收藏
页码:159 / 175
页数:17
相关论文
共 50 条
  • [1] Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems
    Cristiana Bolchini
    Matteo Carminati
    Antonio Miele
    Journal of Electronic Testing, 2013, 29 : 159 - 175
  • [2] Self-Adaptive Hybrid Dynamic Power Management for Many-Core Systems
    Shafique, Muhammad
    Vogel, Benjamin
    Henkel, Joerg
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 51 - 56
  • [3] Multi- and Many-Core Data Mining with Adaptive Sparse Grids
    Heinecke, Alexander
    Pflueger, Dirk
    PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
  • [4] Adaptive Fault Simulation on Many-core Microprocessor Systems
    Haghbayan, Mohammad-Hashem
    Teravainen, Sami
    Rahmani, Amir-Mohammad
    Liljeberg, Pasi
    Tenhunen, Hannu
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2015, : 151 - 154
  • [5] Variability-Aware and Fault-tolerant Self-Adaptive applications for Many-Core chips
    Bizot, Gilles
    Chaix, Fabien
    Zergainoh, Nacer-Eddine
    Nicolaidis, Michael
    PROCEEDINGS OF THE 2013 IEEE 19TH INTERNATIONAL ON-LINE TESTING SYMPOSIUM (IOLTS), 2013, : 37 - 42
  • [6] Variability-Aware and Fault-tolerant Self-Adaptive applications for Many-Core chips
    Bizot, Gilles
    Chaix, Fabien
    Zergainoh, Nacer-Eddine
    Nicolaidis, Michael
    2013 18TH IEEE EUROPEAN TEST SYMPOSIUM (ETS 2013), 2013,
  • [7] Fault-tolerance at the Management Level in Many-core Systems
    Fochi, Vinicius
    Caimi, Luciano L.
    da Silva, Marcelo H.
    Moraes, Fernando Gehm
    2018 31ST SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI), 2018,
  • [8] Self-Adaptive Harris Corner Detector on Heterogeneous Many-Core Processor
    Paul, Johny
    Stechele, Walter
    Sousa, Ericles
    Lari, Vahid
    Hannig, Frank
    Teich, Juergen
    Kroehnert, Manfred
    Asfour, Tamim
    PROCEEDINGS OF THE 2014 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING, 2014,
  • [9] Performance Analysis of Various Multi- and Many-core Systems Centered on Memory
    Rho, Seungwoo
    Choi, Ji Eun
    Park, Geunchul
    Park, Chan-Yeol
    2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 194 - 199
  • [10] Improving the Efficiency of Thermal Covert Channels in Multi-/many-core Systems
    Long, Zijun
    Wang, Xiaohang
    Jiang, Yingtao
    Cui, Guofeng
    Zhang, Li
    Mak, Terrence
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1459 - 1464