A new adaptive fault-tolerant routing methodology for direct networks

被引:0
|
作者
Gómez, ME
Duato, J
Flich, J
López, P
Robles, A
Nordbotten, NA
Skeie, T
Lysne, O
机构
[1] Univ Politecn Valencia, Dept Comp Engn, Valencia 46071, Spain
[2] Simula Res Lab, N-1325 Lysaker, Norway
来源
HIGH PERFORMANCE COMPUTING - HIPC 2004 | 2004年 / 3296卷
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Interconnection networks play a key role in the fault tolerance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to different regular topologies. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodology requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults.
引用
收藏
页码:462 / 473
页数:12
相关论文
共 50 条
  • [1] An effective fault-tolerant routing methodology for direct networks
    Gómez, ME
    Flich, J
    López, P
    Robles, A
    Duato, J
    Nordbotten, NA
    Lysne, O
    Skeie, T
    2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2004, : 222 - 231
  • [2] Fault-Tolerant Adaptive Routing in Dragonfly Networks
    Xiang, Dong
    Li, Bing
    Fu, Yi
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2019, 16 (02) : 259 - 271
  • [3] Fault-Tolerant Routing Methodology for Networks-on-Chip
    Savva, S.
    2017 27TH INTERNATIONAL SYMPOSIUM ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2017,
  • [4] Adaptive fault-tolerant wormhole routing for torus networks
    Shih, JD
    1998 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 558 - 565
  • [5] Adaptive Stochastic Routing in Fault-tolerant On-chip Networks
    Song, Wei
    Edwards, Doug
    Nunez-Yanez, Jose Luis
    Dasgupta, Sohini
    2009 3RD ACM/IEEE INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, 2009, : 32 - +
  • [6] A FAMILY OF FAULT-TOLERANT ROUTING PROTOCOLS FOR DIRECT MULTIPROCESSOR NETWORKS
    GAUGHAN, PT
    YALAMANCHILI, S
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (05) : 482 - 497
  • [7] A fully adaptive fault-tolerant routing methodology based on intermediate nodes
    Nordbotten, NA
    Gómez, ME
    Flich, J
    López, P
    Robles, A
    Skeie, T
    Lysne, O
    Duato, J
    NETWORK AND PARALLEL COMPUTING, PROCEEDINGS, 2004, 3222 : 341 - 356
  • [8] A New Fault-Tolerant Routing Methodology for KNS Topologies.
    Penaranda, Roberto
    Gran, Ernst Gunnar
    Skeie, Tor
    Engracia Gomez, Maria
    Lopez, Pedro
    2016 2ND IEEE INTERNATIONAL WORKSHOP ON HIGH-PERFORMANCE INTERCONNECTION NETWORKS IN THE EXASCALE AND BIG-DATA ERA (HIPINEB), 2016, : 1 - 8
  • [9] Compressionless routing: A framework for adaptive and fault-tolerant routing
    Kim, JH
    Liu, ZQ
    Chien, AA
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (03) : 229 - 244
  • [10] An Adaptive Learning Approach for Fault-Tolerant Routing in Ad Hoc Networks
    Misra, Sudip
    Krishna, P. Venkata
    Bhiwal, Akhil
    Chawla, Amardeep Singh
    Wolfinger, Bernd E.
    E-TECHNOLOGIES AND NETWORKS FOR DEVELOPMENT, 2011, 171 : 15 - 25