SOFTWARE DEPENDABILITY IN THE TANDEM GUARDIAN SYSTEM

被引:62
作者
LEE, IW [1 ]
IYER, RK [1 ]
机构
[1] UNIV ILLINOIS,CTR RELIABLE & HIGH PERFORMANCE COMP,COORDINATED SCI LAB,URBANA,IL 61801
基金
美国国家航空航天局;
关键词
MEASUREMENT; FAULT CATEGORIZATION; SOFTWARE FAULT TOLERANCE; RECURRENCE; SOFTWARE RELIABILITY; OPERATIONAL PHASE; TANDEM GUARDIAN SYSTEM;
D O I
10.1109/32.387474
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Based on extensive field failure data for Tandem's GUARDIAN operating system, this paper discusses evaluation of the dependability of operational software, Software faults considered are major defects that result in processor failures and invoke backup processes to take over, The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults, A model to describe the impact of software faults on the reliability of an overall system is proposed, The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems, The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures, The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance, Over two-thirds (72%) of measured software failures are recurrences of previously reported faults, Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
引用
收藏
页码:455 / 467
页数:13
相关论文
共 37 条
[1]   OPTIMIZING PREVENTIVE SERVICE OF SOFTWARE PRODUCTS [J].
ADAMS, EN .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1984, 28 (01) :2-14
[2]   FAULT INJECTION FOR DEPENDABILITY VALIDATION - A METHODOLOGY AND SOME APPLICATIONS [J].
ARLAT, J ;
AGUERA, M ;
AMAT, L ;
CROUZET, Y ;
FABRE, JC ;
LAPRIE, JC ;
MARTINS, E ;
POWELL, D .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (02) :166-182
[3]  
AVIZIENIS A, 1984, IEEE COMPUTER AUG, P67
[4]  
BARTLETT J, 1990, 905 TAND COMP INC TA
[5]  
BASILI VR, 1984, COMMUN ACM, V22, P42
[6]  
BEAUDRY MD, 1978, IEEE T COMPUT, V27, P540, DOI 10.1109/TC.1978.1675145
[7]  
CASTILLO X, 1981, THESIS CARNEGIEMELLO
[8]   ORTHOGONAL DEFECT CLASSIFICATION - A CONCEPT FOR IN-PROCESS MEASUREMENTS [J].
CHILLAREGE, R ;
BHANDARI, IS ;
CHAAR, JK ;
HALLIDAY, MJ ;
MOEBUS, DS ;
RAY, BK ;
WONG, MY .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1992, 18 (11) :943-956
[9]  
CRISTIAN F, 1982, IEEE T COMPUT, V31, P531, DOI 10.1109/TC.1982.1676035
[10]   COVERAGE MODELING FOR DEPENDABILITY ANALYSIS OF FAULT-TOLERANT SYSTEMS [J].
DUGAN, JB ;
TRIVEDI, KS .
IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (06) :775-787