Fault tolerance analysis of mesh networks with uniform versus nonuniform node failure probability

被引:10
作者
Wang, Gaocai [1 ]
Wang, Guojun [2 ]
Shan, Zhiguang [3 ]
机构
[1] Guangxi Univ, Sch Comp & Elect Informat, Nanning 530004, Peoples R China
[2] Cent S Univ, Sch Informat Sci & Engn, Changsha 410083, Peoples R China
[3] State Informat Ctr, Dept Informatizat Res, Beijing 100045, Peoples R China
关键词
Fault tolerance; Mesh network; k-submesh; Node failure probability; Connectivity probability; ROUTING ALGORITHMS;
D O I
10.1016/j.ipl.2011.12.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mesh networks have been applied to build large scale multicomputer systems and Network-on-Chips (NoCs). Mesh networks perform poorly in tolerating faults in the view of worst-case analysis, so it is practically important for multicomputer systems and NoCs manufactures to determine the lower bound for the mesh network connectivity probability when the node failure probability and the network size are given. In this paper, we study the topic based on k-submesh model under two fault models: Each node has uniform or nonuniform failure probability. We develop a novel technique to formally derive lower bound on the connectivity probability for mesh networks. Our study shows that mesh networks of practical size can tolerate a large number of faulty nodes and maintain higher connectivity probability, thus are reliable and trustworthy enough for multicomputer systems and NoCs. For example, suppose we are building a mesh network of 40 000 nodes (e.g., M-200x200) and require a network connectivity probability 99%, we only need to bound the uniform node failure probability by 0.25%. On the other hand, for the same size network M-200 x 200, the mesh network connectivity probability can maintain 95.88% even the network runs one million seconds uninterruptedly under exponential distribution node failure probability with failure rate 10(-9) level. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:396 / 401
页数:6
相关论文
共 22 条
[1]  
AGARWAL A, 1995, ACM COMP AR, P2, DOI 10.1109/ISCA.1995.524544
[2]   Blue Gene: A vision for protein science using a petaflop supercomputer [J].
Allen, F ;
Almasi, G ;
Andreoni, W ;
Beece, D ;
Berne, BJ ;
Bright, A ;
Brunheroto, J ;
Cascaval, C ;
Castanos, J ;
Coteus, P ;
Crumley, P ;
Curioni, A ;
Denneau, M ;
Donath, W ;
Eleftheriou, M ;
Fitch, B ;
Fleischer, B ;
Georgiou, CJ ;
Germain, R ;
Giampapa, M ;
Gresh, D ;
Gupta, M ;
Haring, R ;
Ho, H ;
Hochschild, P ;
Hummel, S ;
Jonas, T ;
Lieber, D ;
Martyna, G ;
Maturu, K ;
Moreira, J ;
Newns, D ;
Newton, M ;
Philhower, R ;
Picunko, T ;
Pitera, J ;
Pitman, M ;
Rand, R ;
Royyuru, A ;
Salapura, V ;
Sanomiya, A ;
Shah, R ;
Sham, Y ;
Singh, S ;
Snir, M ;
Suits, F ;
Swetz, R ;
Swope, WC ;
Vishnumurthy, N ;
Ward, TJC .
IBM SYSTEMS JOURNAL, 2001, 40 (02) :310-327
[3]   Tera computer system [J].
Alverson, Robert ;
Callahan, David ;
Cummings, Daniel ;
Koblenz, Brian ;
Porterfield, Allan ;
Smith, Burton .
Conference Proceedings - International Conference on Supercomputing, 1990,
[4]  
Bell S., 2008, P 2008 IEEE INT SOL, DOI DOI 10.1109/ISSCC.2008.4523070
[5]  
Blank T., 1990, COMPCON Spring '90: Thirty-Fifth IEEE Computer Society International Conference. Intellectual Leverage. Digest of Papers. (Cat. No.90CH2843-1), P20, DOI 10.1109/CMPCON.1990.63648
[6]   FAULT-TOLERANT WORMHOLE ROUTING ALGORITHMS FOR MESH NETWORKS [J].
BOPPANA, RV ;
CHALASANI, S .
IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (07) :848-864
[7]   A fault-tolerant routing scheme for meshes with nonconvex faults [J].
Chen, CL ;
Chiu, GM .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, 12 (05) :467-475
[8]  
Cormen T., 2001, Introduction to Algorithms
[9]  
Fick D, 2009, DES AUT TEST EUROPE, P21
[10]   A Hardware-Oriented Fault-Tolerant Routing Algorithm for Irregular 2D-Mesh Network-on-Chip without Virtual Channels [J].
Fukushima, Yusuke ;
Fukushi, Masaru ;
Yairi, Ikuko Eguchi ;
Hattori, Takeshi .
2010 IEEE 25TH INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS (DFT 2010), 2010, :52-59