On the effectiveness of a message-driven confidence-driven protocol for guarded software upgrading

被引:7
作者
Tai, AT
Tso, KS
Alkalai, L
Chau, SN
Sanders, WH
机构
[1] IA Tech Inc, Los Angeles, CA 90024 USA
[2] CALTECH, Jet Prop Lab, Pasadena, CA 91109 USA
[3] Univ Illinois, ECE Dept, Urbana, IL 61801 USA
关键词
guarded software upgrading; inherent resource redundancy; error containment and recovery; checkpointing; stochastic activity networks; reliability improvement; model-based evaluation;
D O I
10.1016/S0166-5316(00)00054-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A methodology called guarded software upgrading (GSU) is proposed to accomplish dependable onboard evolution for long-life deep-space missions. The core of the methodology is a low-cost error containment and recovery protocol that escorts an upgraded software component through onboard validation and guarded operation, mitigating the effect of residual faults in the upgraded component. The message-driven confidence-driven (MDCD) nature of the protocol eliminates the need for costly process coordination or atomic action, yet guarantees that the system will reach a consistent global state upon the completion of the rollback or roll-forward actions carried out by individual processes during error recovery. To validate the ability of the MDCD protocol to enhance system reliability when a software component undergoes onboard upgrading in a realistic, non-ideal environment, we conduct a stochastic activity network model-based analysis. The results confirm the effectiveness of the protocol as originally surmised. Moreover, a comparative study reveals that the dynamic confidence-driven approach is superior to static approaches and is the key to the attainment of cost-effectiveness. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:211 / 236
页数:26
相关论文
共 19 条
[1]  
ABRAHAM JA, 1999, P PAC RIM INT S DEP
[2]  
Alkalai L, 1998, COMPUTER, V31, P37
[3]  
[Anonymous], 1996, ARIANE 5 FLIGHT 501
[4]   Toward systematic design of fault-tolerant systems [J].
Avizienis, A .
COMPUTER, 1997, 30 (04) :51-+
[5]   EFFECTS OF FIELD SERVICE ON SOFTWARE-RELIABILITY [J].
BAKER, CT .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (02) :254-258
[6]   Design of a fault-tolerant COTS-based bus architecture [J].
Chau, SN ;
Alkalai, L ;
Tai, AT ;
Burt, JB .
IEEE TRANSACTIONS ON RELIABILITY, 1999, 48 (04) :351-359
[7]  
ELNOZAHY E, 1996, CMUCS96181 CARN MELL
[8]  
Meyer JF, 1985, P INT WORKSH TIM PET, P106
[9]   Adaptive recovery for mobile environments [J].
Neves, N ;
Fuchs, WK .
COMMUNICATIONS OF THE ACM, 1997, 40 (01) :68-74
[10]   Coordinated checkpointing without direct coordination [J].
Neves, N ;
Fuchs, WK .
IEEE INTERNATIONAL COMPUTER PERFORMANCE AND DEPENDABILITY SYMPOSIUM -PROCEEDINGS, 1998, :23-31