ERROR LOG ANALYSIS - STATISTICAL MODELING AND HEURISTIC TREND ANALYSIS

被引:89
作者
LIN, TTY [1 ]
SIEWIOREK, DP [1 ]
机构
[1] CARNEGIE MELLON UNIV,SCH COMP SCI,DEPT ELECT & COMP ENGN,PITTSBURGH,PA 15213
关键词
Error log; Failure prediction; Hard failures; Intermittent and transient faults; Weibull distribution;
D O I
10.1109/24.58720
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Most error log analysis studies perform a statistical fit to the data assuming a single underlying error process. This paper presents the results of an analysis that demonstrates the log is composed of at least two error processes: transient and intermittent. The mixing of data from multiple processes requires many more events to verify a hypothesis using traditional statistical analysis. Based on the shape of the interarrivai time function of the intermittent errors observed from actual error logs, a failure prediction heuristic, the Dispersion Frame Technique (DFT), is developed. The DFT was implemented in a distributed on-line monitoring and predictive diagnostic system for the campus-wide Andrew file system at Carnegie Mellon University. Data collected from 13 file servers over a 22 month period were analyzed using both the DFT and conventional statistical methods. It is shown that the DFT can extract intermittent errors from the error log and uses only one fifth of the error log entry points required by statistical methods for failure prediction. The DFT achieved a 93.7% success rate in failure prediction of both electromechanical and electronic devices. © 1990 IEEE
引用
收藏
页码:419 / 432
页数:14
相关论文
共 8 条
[1]   TESTING FOR INTERMITTENT FAULTS IN DIGITAL CIRCUITS [J].
BREUER, MA .
IEEE TRANSACTIONS ON COMPUTERS, 1973, C 22 (03) :241-246
[2]  
IYER RK, 1986, 1986 P JOINT COMP C
[3]   APPROACH TO DIAGNOSIS OF INTERMITTENT FAULTS [J].
KAMAL, S .
IEEE TRANSACTIONS ON COMPUTERS, 1975, C 24 (05) :461-467
[4]  
LIN T, 1988, THESIS CARNEGIEMELLO
[5]   DISCRETE WEIBULL DISTRIBUTION [J].
NAKAGAWA, T ;
OSAKI, S .
IEEE TRANSACTIONS ON RELIABILITY, 1975, 24 (05) :300-301
[6]  
NASSAR FA, 1985, CRC8520 STANF U TECH
[7]  
Swarz R. S., 1982, THEORY PRACTICE RELI
[8]  
TSAO MM, 1983, 130 CARN U DEP COMP