FINE - A FAULT INJECTION AND MONITORING ENVIRONMENT FOR TRACING THE UNIX SYSTEM BEHAVIOR UNDER FAULTS

被引:77
作者
KAO, WL [1 ]
IYER, RK [1 ]
TANG, D [1 ]
机构
[1] UNIV ILLINOIS,DEPT COMP SCI,URBANA,IL 61801
基金
美国国家航空航天局;
关键词
FAULT ERROR INJECTION; FAULT MODELING; SOFTWARE MONITOR; FAULT ERROR PROPAGATION; FAULT PROPAGATION MODELING; TRANSIENT REWARD ANALYSIS; UNIX KERNEL;
D O I
10.1109/32.256857
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fault injection has been used to evaluate the dependability of computer systems, but most fault-injection studies concentrate on the final impact of faults on the system with an emphasis on fault latency and coverage issues. What really happens after a fault is injected and how a fault propagates in a software system are not well understood. This paper presents a fault injection and monitoring environment (FINE) as a tool to study fault propagation in the UNIX kernel. FINE injects hardware-induced software errors and software faults into the UNIX kernel and traces the execution flow and key variables of the kernel. It consists of a fault injector, a software monitor, a workload generator, a controller, and several analysis utilities. Experiments on SunOS 4.1.2 are conducted by applying FINE to investigate fault propagation and to evaluate the impact of various types of faults. Fault propagation models are built for both hardware and software faults. Transient Markov reward analysis is performed based on the models to evaluate the loss of performance due to an injected fault. Experimental results show that memory faults and software faults usually have a very long latency while bus faults and CPU faults tend to crash the system immediately. About half of the detected errors are data faults, which are detected when the system tries to access an unauthorized memory location. Only about 8% of faults propagate to other UNIX subsystems. Markov reward analysis shows that the performance loss incurred by bus faults and CPU faults is much higher than that incurred by software and memory faults. Among software faults, the impact of pointer faults is higher than that of nonpointer faults.
引用
收藏
页码:1105 / 1118
页数:14
相关论文
共 32 条
[1]   FAULT INJECTION FOR DEPENDABILITY VALIDATION - A METHODOLOGY AND SOME APPLICATIONS [J].
ARLAT, J ;
AGUERA, M ;
AMAT, L ;
CROUZET, Y ;
FABRE, JC ;
LAPRIE, JC ;
MARTINS, E ;
POWELL, D .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (02) :166-182
[2]  
ARLAT J, 1989, 19TH P INT S FAULT T, P348
[3]   FAULT INJECTION EXPERIMENTS USING FIAT [J].
BARTON, JH ;
CZECK, EW ;
SEGALL, ZZ ;
SIEWIOREK, DP .
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (04) :575-582
[4]  
CHILLAREGE R, 1987, IEEE T COMPUT, V36, P529, DOI 10.1109/TC.1987.1676937
[5]  
CHILLAREGE R, 1991, 13TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P246, DOI 10.1109/ICSE.1991.130649
[6]  
CHILLAREGE R, 1989, 19TH P INT S FAULT T, P356
[7]   SEU VULNERABILITY OF THE ZILOG Z-80 AND NSC-800 MICROPROCESSORS [J].
CUSICK, J ;
KOGA, R ;
KOLASINSKI, WA ;
KING, C .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1985, 32 (06) :4206-4211
[8]  
DEVARAKONDA M, 1990, IBM RC16342 RES REP
[9]  
Endres A., 1975, IEEE Transactions on Software Engineering, VSE-1, P140, DOI 10.1109/TSE.1975.6312834
[10]   CHARACTERIZATION OF FAULT RECOVERY THROUGH FAULT INJECTION ON FTMP [J].
FINELLI, GB .
IEEE TRANSACTIONS ON RELIABILITY, 1987, 36 (02) :164-170