Systematic design of fault-tolerant multiprocessors with shared buses

被引:8
作者
Ku, HK [1 ]
Hayes, JP [1 ]
机构
[1] UNIV MICHIGAN, DEPT ELECT ENGN & COMP SCI, ADV COMP ARCHITECTURE LAB, ANN ARBOR, MI 48109 USA
基金
美国国家科学基金会;
关键词
fault tolerance; graph model; interconnection method; multiple-bus architecture; point-to-point connection; VLSI design;
D O I
10.1109/12.588058
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A multiprocessor system is fault-tolerant (FT) if it preserves a fault-free subsystem of a predetermined interconnection structure when faults appear. We present a new method for designing FT multiprocessors that can efficiently tolerate both processor and interconnection faults. The approach is general, in that it can be applied to any multiprocessor topology. Shared buses serve as the main interconnection mechanism to minimize the switching logic needed for reconfiguration. We employ processor-bus-link (PBL) graphs to model multiprocessors with either dedicated or shared buses. Both processors and buses are represented as nodes so that bus faults can be considered explicitly and tolerated efficiently by spare buses instead of by spare processors. A minimum number of spare processors and buses are used to reduce hardware overhead. The node covering concept and the maximum-weight spanning tree algorithm are then employed to construct FT systems that have lower interconnection cost than most previous designs. We also present a cost-effective implementation method which is suitable for both static and dynamic reconfiguration techniques. The FT systems obtained have the advantages of no critical single point of failure, low redundancy, local replacement, and simple circuitry for fast reconfiguration.
引用
收藏
页码:439 / 455
页数:17
相关论文
共 27 条
[1]   FAULT-TOLERANT MESHES AND HYPERCUBES WITH MINIMAL NUMBERS OF SPARES [J].
BRUCK, J ;
CYPHER, R ;
HO, CT .
IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (09) :1089-1104
[2]  
CHAU SC, 1989, 19TH P INT S FAULT T, P323
[3]   A TAXONOMY OF RECONFIGURATION TECHNIQUES FOR FAULT-TOLERANT PROCESSOR ARRAYS [J].
CHEAN, M ;
FORTES, JAB .
COMPUTER, 1990, 23 (01) :55-69
[4]  
Cormen T. H., 1990, INTRO ALGORITHMS
[5]   DESIGNING FAULT-TOLERANT SYSTEMS USING AUTOMORPHISMS [J].
DUTT, S ;
HAYES, JP .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 12 (03) :249-268
[6]   SOME PRACTICAL ISSUES IN THE DESIGN OF FAULT-TOLERANT MULTIPROCESSORS [J].
DUTT, S ;
HAYES, JP .
IEEE TRANSACTIONS ON COMPUTERS, 1992, 41 (05) :588-598
[7]  
DUZETT B, 1992, FRONTIERS 92 : THE FOURTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION, P458, DOI 10.1109/FMPC.1992.234880
[8]  
Glasser L. A., 1985, DESIGN ANAL VLSI CIR
[9]  
Goodman J. R., 1988, 15th Annual International Symposium on Computer Architecture. Conference Proceedings (Cat. No.88CH2545-2), P422, DOI 10.1109/ISCA.1988.5253
[10]   A CENSUS OF TANDEM SYSTEM AVAILABILITY BETWEEN 1985 AND 1990 [J].
GRAY, J .
IEEE TRANSACTIONS ON RELIABILITY, 1990, 39 (04) :409-418