Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

被引:2
|
作者
Quintin, Jean-Noel [1 ]
Vigneras, Pierre [1 ]
机构
[1] Atos, Campus Teratec,2 Rue Piquetterie, F-91680 Bruyeres Le Chatel, France
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Fabric Management; Routing; Fault-Tolerant Routing; BXI; Interconnect Management; High Performance Computing; TOPOLOGY;
D O I
10.1109/CLUSTER.2015.135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
引用
收藏
页码:793 / 800
页数:8
相关论文
共 50 条
  • [1] The BXI routing architecture for exascale supercomputer
    Vigneras, Pierre
    Quintin, Jean-Noel
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (12) : 4418 - 4437
  • [2] The BXI routing architecture for exascale supercomputer
    Pierre Vignéras
    Jean-Noël Quintin
    The Journal of Supercomputing, 2016, 72 : 4418 - 4437
  • [3] Fault-Tolerant Routing in Bicubes
    Wang, Yitong
    Kyaw, Htoo Htoo Sandi
    Fujiyoshi, Kunihiro
    Kaneko, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2025, E108D (01) : 74 - 81
  • [4] FAULT-TOLERANT ROUTING IN MESH ARCHITECTURES
    OLSON, A
    SHIN, KG
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (11) : 1225 - 1232
  • [5] A MDP Approach to Fault-Tolerant Routing
    Pietrabissa, Antonio
    Castrucci, Marco
    Palo, Andi
    EUROPEAN JOURNAL OF CONTROL, 2012, 18 (04) : 334 - 347
  • [6] Use of routing capability for fault-tolerant routing in hypercube multicomputers
    Chiu, GM
    Chen, KS
    IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (08) : 953 - 958
  • [7] Fault-tolerant routing in 2D torus with wormhole routing
    Acosta, JR
    Avresky, DR
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 2483 - 2487
  • [8] Shortest path routing and fault-tolerant routing on de Bruijn networks
    Mao, JW
    Yang, CB
    NETWORKS, 2000, 35 (03) : 207 - 215
  • [9] ROUTING IN MODULAR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS
    ALAM, MS
    MELHEM, RG
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (11) : 1206 - 1220
  • [10] A theory of fault-tolerant routing in wormhole networks
    Duato, J
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (08) : 790 - 802