Reliability analysis of clustered computing systems

被引:19
作者
Mendiratta, VB [1 ]
机构
[1] AT&T Bell Labs, Lucent Technol, Naperville, IL 60566 USA
来源
NINTH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS | 1998年
关键词
D O I
10.1109/ISSRE.1998.730890
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Clustered computing systems, using commercially available computers networked in a loosely-coupled fashion, can provide high levels of reliability if appropriate levels of error detection and recovery software are implemented in the middleware and application layers. In this paper we present a modeling approach for analyzing the hardware and software reliability of clustered computing systems. The clustered system is modeled as art in educible Markov chain with working and failed states, and intermediate recovery states. The failure and recovery behavior is characterized in terms of the frequency and duration of fault recoveries and outages for a single processor in the cluster and for the entire clustered system. We apply the model to a telecommunication switching system application that uses the Lucent Technologies Reliable Clustered Computing product. The model results are presented for a range of values of the processor failure rate and the fault recovery coverage factor.
引用
收藏
页码:268 / 272
页数:5
相关论文
共 5 条
[1]  
Bouricius W. G., 1969, P 24 NAT C, P295
[2]  
HOEFLIN DA, 1995, P 15 INT SWITCH S BE
[3]  
*LUC TECHN, 1997, REL CLUST COMP TECHN
[4]  
*RADC, 1987, METH SOFTW REL PRED
[5]  
Sahner R. A., 1996, PERFORMANCE RELIABIL