Fault injection techniques and tools

被引:458
作者
Hsueh, MC
Tsai, TK
Iyer, RK
机构
[1] AT&T BELL LABS, LUCENT TECHNOL, TECH STAFF, MURRAY HILL, NJ 07954 USA
[2] UNIV ILLINOIS, DEPT ELECT & COMP ENGN, URBANA, IL 61801 USA
[3] UNIV ILLINOIS, DEPT COMP SCI, URBANA, IL 61801 USA
[4] NASA, CTR EXCELLENCE AEROSP COMP, ILLINOIS COMP LAB AEROSP SYST & SOFTWARE, BATAVIA, IL USA
[5] NASA, CTR RELIABLE & HIGH PERFORMANCE COMP, BATAVIA, IL USA
基金
美国国家航空航天局;
关键词
D O I
10.1109/2.585157
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dependability evaluation involves the study of failures and errors. The destructive nature of a crash and long error latency make it difficult to identify the causes of failures in the operational environment. It is particularly hard to recreate a failure scenario for a large, complex system. To identify and understand potential failures, the authors use an experiment-based approach for studying system dependability. This approach is applied during the conception, design, prototype, and operational phases. To take an experiment-based approach, you must first understand a system's architecture, structure, and behavior. You need to know its tolerance for faults and failures, including its builtin detection and recovery mechanisms, and you need specific instruments and tools to inject faults, create failures or errors, and monitor their effects. Engineers most often use low-cost, simulation-based fault injection to evaluate the dependability of a system that is in the conceptual and design phases. At this point, the system under study is only a series of high-level abstractions; implementation details have yet to be determined. Thus the system is simulated on the basis of simplified assumptions. Simulation-based fault injection, which assumes that errors or failures occur according to predetermined distribution, is useful for evaluating the effectiveness of fault-tolerant mechanisms and a system's dependability; it does provide timely feedback to system engineers. However, it requires accurate input parameters, which are difficult to supply: Design and technology changes often complicate the use of past measurements. Testing a prototype, on the other hand, allows you to evaluate the system without any assumptions about system design. Instead of injecting faults, engineers can directly measure operational systems as they handle real workloads. Measurement-based analysis uses actual data, which contains much information about naturally occurring errors and failures and sometimes about recovery attempts. Although these three experimental methods have limitations, their unique values complement one another and allow for a wide spectrum of dependability studies.
引用
收藏
页码:75 / +
相关论文
共 10 条
[1]  
ABRAHAM JA, 1995, P 25 ANN INT S FAULT, P96
[2]  
ARLAT J, 1989, 19TH P INT S FAULT T, P348
[3]  
CARREIRA J, P 5 ANN IEEE INT WOR, P135
[4]   FAULT INJECTION - A METHOD FOR VALIDATING COMPUTER-SYSTEM DEPENDABILITY [J].
CLARK, JA ;
PRADHAN, DK .
COMPUTER, 1995, 28 (06) :47-56
[5]  
GUNNETLO O, 1989, P 19 ANN INT S FAULT, P340
[6]  
HAN B, 1995, P INT CONG EXPERIT M, P204
[7]  
Kanawati G.A., 1992, 1992 FTCS 22 INT S F, P336
[8]  
Karlsson J., 1995, Proc. of 5th IEEE International Working Conference on Dependable Computing for Critical Applications, P150
[9]  
TANG D, FAULT TOLERANT COMPU, P282
[10]   An approach towards benchmarking of fault-tolerant commercial systems [J].
Tsai, TK ;
Iyer, RK ;
Jewitt, D .
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, 1996, :314-323