Rapid Recovery for Systems with Scarce Faults

被引:149
作者
Huang, Chung-Hao [1 ]
Peled, Doron [2 ]
Schewe, Sven [3 ]
Wang, Farn [4 ,5 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei, Taiwan
[2] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[4] Natl Taiwan Univ, Dept EE, Taipei, Taiwan
[5] Acad Sinica, CITI, Taipei, Taiwan
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.4204/EPTCS.96.2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 35 条
[1]  
Abd-El-Malek M, 2005, P 20 ACM S OP SYST P, P59, DOI [DOI 10.1145/1095810.1095817, 10.1145/1095810.1095817]
[2]   CLOSURE AND CONVERGENCE - A FOUNDATION OF FAULT-TOLERANT COMPUTING [J].
ARORA, A ;
GOUDA, M .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (11) :1015-1027
[3]  
Asarin E, 1995, LECT NOTES COMPUT SC, V999, P1
[4]   Synthesis of fault-tolerant concurrent programs [J].
Attie, PC ;
Arora, A ;
Emerson, EA .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2004, 26 (01) :125-185
[5]   THE CCITT-SPECIFICATION AND DESCRIPTION LANGUAGE SDL [J].
BELINA, F ;
HOGREFE, D .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1989, 16 (04) :311-341
[6]  
Bloem Roderick, 2009, Proceedings of the 2009 9th International Conference Formal Methods in Computer-Aided Design (FMCAD), P85, DOI 10.1109/FMCAD.2009.5351139
[7]   SOLVING SEQUENTIAL CONDITIONS BY FINITE-STATE STRATEGIES [J].
BUCHI, JR ;
LANDWEBER, LH .
TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1969, 138 (APR) :295-+
[8]   Practical byzantine fault tolerance and proactive recovery [J].
Castro, M ;
Liskov, B .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2002, 20 (04) :398-461
[9]  
Church A, 1962, J SYMBOLIC LOGIC, P23, DOI DOI 10.2307/2270398
[10]  
Clement A., 2009, P 6 USENIX S NETW SY, P153