Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

被引:2
|
作者
Quintin, Jean-Noel [1 ]
Vigneras, Pierre [1 ]
机构
[1] Atos, Campus Teratec,2 Rue Piquetterie, F-91680 Bruyeres Le Chatel, France
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Fabric Management; Routing; Fault-Tolerant Routing; BXI; Interconnect Management; High Performance Computing; TOPOLOGY;
D O I
10.1109/CLUSTER.2015.135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
引用
收藏
页码:793 / 800
页数:8
相关论文
共 50 条
  • [21] Fault-tolerant routing on the star graph with safety vectors
    Yeh, SI
    Yang, CB
    Chen, HC
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 301 - 306
  • [22] FAULT-TOLERANT WORMHOLE ROUTING ALGORITHMS FOR MESH NETWORKS
    BOPPANA, RV
    CHALASANI, S
    IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (07) : 848 - 864
  • [23] Fault-tolerant Routing in (n, k)-Star Graphs
    Ito, Takara
    Myojin, Manabu
    Hirai, Yuki
    Kaneko, Keiichi
    2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014), 2014, : 113 - 118
  • [24] Distributed and Fault-Tolerant Routing for Borel Cayley Graphs
    Ryu, Junghun
    Noel, Eric
    Tang, K. Wendy
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2012,
  • [25] Fault-tolerant routing algorithms for hypercube interconnection networks
    Kaneko, K
    Ito, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (01): : 121 - 128
  • [26] Fault-tolerant routing algorithm for EOC interconnection network
    Al-Sadi, JA
    Sarie, TH
    AMCS '05: Proceedings of the 2005 International Conference on Algorithmic Mathematics and Computer Science, 2005, : 107 - 113
  • [27] A protocol synthesis method for fault-tolerant multipath routing
    Ishida, K
    Kakuda, Y
    Nakamura, M
    Kikuno, T
    Amano, K
    INFORMATION AND SOFTWARE TECHNOLOGY, 1999, 41 (11-12) : 745 - 754
  • [28] FAULT-TOLERANT ROUTING IN THE STAR AND PANCAKE INTERCONNECTION NETWORKS
    GARGANO, L
    VACCARO, U
    VOZELLA, A
    INFORMATION PROCESSING LETTERS, 1993, 45 (06) : 315 - 320
  • [29] Optimal fault-tolerant routing algorithm and fault-tolerant diameter in directed double-loop networks
    Chen, Yebin
    Li, Ying
    Chen, Tao
    THEORETICAL COMPUTER SCIENCE, 2013, 468 : 50 - 58
  • [30] Fault-tolerant wormhole routing in mesh with overlapped solid fault regions
    Kim, SP
    Han, T
    PARALLEL COMPUTING, 1997, 23 (13) : 1937 - 1962