Quantifying temporal and spatial correlation of failure events for proactive management

被引:22
作者
Fu, Song [1 ]
Xu, Cheng-Zhong [1 ]
机构
[1] Wayne State Univ, Dept Elect & Comp Engn, Detroit, MI 48202 USA
来源
SRDS 2007: 26TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 2007年
关键词
D O I
10.1109/SRDS.2007.18
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Moreover failure events exhibit strong correlations in time and space domain. In this paper we develop a spherical covariance model with an adjustable timescale parameter to quantify the temporal correlation and a stochastic model to characterize spatial correlation. The models are further extended to take into account the information of application allocation to discover more correlations among failure instances. We cluster failure events based on their correlations and predict their future occurrences. Experimental results on a production coalition system, the Wayne State Grid, show the offline and online predictions by our predicting system can forecast 72.7% to 85.3% of the failure occurrences and capture failure correlations in cluster coalition environment.
引用
收藏
页码:175 / +
页数:2
相关论文
共 20 条
[1]  
[Anonymous], P IEEE C DEP SYST NE
[2]  
[Anonymous], P ACM C KNOWL DISC D
[3]  
[Anonymous], P IEEE C DEP SYST NE
[4]   Objective Bayesian analysis of spatially correlated data [J].
Berger, JO ;
De Oliveira, V ;
Sansó, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1361-1374
[5]  
CASTIUO X, 1981, P S FAULT TOL COMP F
[6]  
CHALLAGULLA VUB, 2005, P WORKSH OBJ OR REAL
[7]  
DEFAGO X, 2005, P IEEE C DEP SYST NE
[8]  
Gretl, GNU REGR EC TIM SER
[9]  
Heath T., 2002, P ACM C MEAS MOD COM
[10]  
HOFFMANN GA, 2006, P IEEE S REL DISTR S