Availability modeling and analysis on high performance cluster computing systems

被引:0
作者
Song, Hertong [1 ]
Leangsuksun, Chokchai 'box' [1 ]
Nassar, Raja [2 ]
Gottumukkala, Narasirnha Raju [2 ]
Scott, Stephen [2 ]
机构
[1] Louisiana Tech Univ, Coll Engn & Sci, Ruston, LA 71272 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
来源
FIRST INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, PROCEEDINGS | 2006年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cluster computing has been attracting more and more attention from both the industry and the academia for its enormous computing power, cost effectiveness, and scalability. Availability is a key system attribute that needs to be considered both at system design stage and must reflect the actuality. System monitoring and logging enables identifying unplanned events to reflect the actual system's availability. This paper proposes a single framework that coordinates event monitoring, filtering, data analysis and dynamic availability modeling. The availability model is abstracted and categorized based on functionality. We describe the proposed architecture, and a sample analysis of real time event logs from a 512 node cluster from Lawrence Livermore National Laboratory.
引用
收藏
页码:305 / +
页数:4
相关论文
共 35 条
[1]   IMPROVED ALGORITHM FOR NETWORK RELIABILITY [J].
ABRAHAM, JA .
IEEE TRANSACTIONS ON RELIABILITY, 1979, 28 (01) :58-61
[2]   Preprocessing minpaths for sum of disjoint products [J].
Balan, AO ;
Traldi, L .
IEEE TRANSACTIONS ON RELIABILITY, 2003, 52 (03) :289-295
[3]  
Becker D. J., 1995, P INT C PAR PROC, V95
[4]   RELIABILITY-ANALYSIS OF INTERCONNECTION NETWORKS USING HIERARCHICAL COMPOSITION [J].
BLAKE, JT ;
TRIVEDI, KS .
IEEE TRANSACTIONS ON RELIABILITY, 1989, 38 (01) :111-120
[5]  
Bremaud P., 1999, MARKOV CHAINS GIBBS
[6]  
CHILLAREGE R, 1996, P 25 INT S FAULT TOL
[7]  
CIARDO G, 1993, LINEAR ALGEBRA MARKO, P141
[8]   DYNAMIC FAULT-TREE MODELS FOR FAULT-TOLERANT COMPUTER-SYSTEMS [J].
DUGAN, JB ;
BAVUSO, SJ ;
BOYD, MA .
IEEE TRANSACTIONS ON RELIABILITY, 1992, 41 (03) :363-377
[9]  
DUGAN JB, 1984, THESIS DUKE U
[10]  
Haverkort B. R., 1993, Discrete Event Dynamic Systems: Theory & Applications, V3, P219, DOI 10.1007/BF01439850