Adaptive Fault Tolerance through Invasive Computing

被引:0
作者
Witterauf, Michael [1 ]
Tanase, Alexandru [1 ]
Teich, Juergen [1 ]
Lari, Vahid [1 ]
Zwinkau, Andreas [2 ]
Snelting, Gregor [2 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen, Germany
[2] Karlsruhe Inst Technol, D-76021 Karlsruhe, Germany
来源
2015 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS) | 2015年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fault tolerance is a basic necessity to make today's complex systems reliable. Adequate fault tolerance, however, demands a high degree of redundancy, possibly wasting resources when the fault probability is low or when some applications do not require fault tolerance. Under the term adaptive fault tolerance, we investigate means to instead provide on-demand fault tolerance on multi-core systems dynamically and according to application and environmental needs. Such means are provided on a per-application basis by invasive computing, a recent paradigm for resource-aware programming and design of parallel systems: applications request resources in an invade phase, infect the acquired resources with code and data, and finally release them in a retreat phase. We show how to use these simple but powerful constructs to adaptively tolerate faults and that invasive computing harmonizes well with many existing fault tolerance approaches. Finally, a case study on adaptively providing fault tolerance for loops demonstrates how effective invasive computing is for adapting to a varying soft error rate and handling of faults.
引用
收藏
页数:8
相关论文
共 18 条
[1]  
[Anonymous], 2011, Design, Automation Test in Europe Confer- ence Exhibition (DATE), 2011, DOI DOI 10.1109/ICC.2011.5963033
[2]  
[Anonymous], 2011, P 14 INT WORKSH SOFT, DOI DOI 10.1145/1988932.1988941
[3]   Basic concepts and taxonomy of dependable and secure computing [J].
Avizienis, A ;
Laprie, JC ;
Randell, B ;
Landwehr, C .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (01) :11-33
[4]   X10: An object-oriented approach to non-uniform cluster computing [J].
Charles, P ;
Donawa, C ;
Ebcioglu, K ;
Grothoff, C ;
Kielstra, A ;
von Praun, C ;
Saraswat, V ;
Sarkar, V .
ACM SIGPLAN NOTICES, 2005, 40 (10) :519-538
[5]  
Cunningham D, 2014, ACM SIGPLAN NOTICES, V49, P67, DOI [10.1145/2555243.2555248, 10.1145/2692916.2555248]
[6]  
Drinkwater M., 2009, ESA BULL-EUR SPACE, V137, P7
[7]  
Gizopoulos D, 2011, DES AUT TEST EUROPE, P533
[8]   A Self-Adaptive SEU Mitigation System for FPGAs with an Internal Block RAM Radiation Particle Sensor [J].
Glein, Robert ;
Schmidt, Bernhard ;
Rittner, Florian ;
Teich, Juergen ;
Ziener, Daniel .
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, :251-258
[9]   Invasive Tightly-Coupled Processor Arrays: A Domain-Specific Architecture/Compiler Co-Design Approach [J].
Hannig, Frank ;
Lari, Vahid ;
Boppu, Srinivas ;
Tanase, Alexandru ;
Reiche, Oliver .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2014, 13
[10]   Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing [J].
Jacobs, Adam ;
Cieslewski, Grzegorz ;
George, Alan D. ;
Gordon-Ross, Ann ;
Lam, Herman .
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2012, 5 (04)