Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

被引:2
|
作者
Quintin, Jean-Noel [1 ]
Vigneras, Pierre [1 ]
机构
[1] Atos, Campus Teratec,2 Rue Piquetterie, F-91680 Bruyeres Le Chatel, France
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Fabric Management; Routing; Fault-Tolerant Routing; BXI; Interconnect Management; High Performance Computing; TOPOLOGY;
D O I
10.1109/CLUSTER.2015.135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
引用
收藏
页码:793 / 800
页数:8
相关论文
共 50 条
  • [41] A Fault-tolerant QoS Routing Mechanism Based on PSO and SA
    Zhang, Qing Yi
    Wang, Xing Wei
    Li, Fu Liang
    Huang, Min
    2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 255 - 260
  • [42] Design of a Fault-Tolerant Pseudo-3D Routing
    Bhowmik, Biswajit
    Gagan, N.
    2023 IEEE INTERNATIONAL TEST CONFERENCE INDIA, ITC INDIA, 2023,
  • [43] Fault-tolerant wormhole routing in meshes without virtual channels
    Glass, CJ
    Ni, LM
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (06) : 620 - 636
  • [44] A Scalable and Reconfigurable Fault-Tolerant Distributed Routing Algorithm for NoCs
    Shi, Zewen
    Zeng, Xiaoyang
    Yu, Zhiyi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (07): : 1386 - 1397
  • [45] A degradable NoC router for the improvement of fault-tolerant routing performance
    Masaru Fukushi
    Toshihiro Katsuta
    Yota Kurokawa
    Artificial Life and Robotics, 2020, 25 : 301 - 307
  • [46] An Adaptive Learning Approach for Fault-Tolerant Routing in Internet of Things
    Misra, Sudip
    Krishna, P. Venkata
    Agarwal, Harshit
    Gupta, Anshima
    Obaidat, Mohammad S.
    2012 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2012,
  • [47] Conditional fault-tolerant routing of (n,k)-star graphs
    Lv, Yali
    Xiang, Yonghong
    Fan, Jianxi
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2016, 93 (10) : 1695 - 1707
  • [48] Disjoint Paths Construction and Fault-Tolerant Routing in BCube of Data Center Networks
    Fan, Weibei
    Xiao, Fu
    Cai, Hui
    Chen, Xiaobai
    Yu, Shui
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (09) : 2467 - 2481
  • [49] A unified fault-tolerant routing scheme for a class of cluster networks
    Day, Khaled
    Arafeh, Bassel
    Touzene, Abderezak
    JOURNAL OF SYSTEMS ARCHITECTURE, 2008, 54 (08) : 757 - 768
  • [50] Default Gateway Protection Scheme in Fault-Tolerant IP Routing
    Yeremenko, Oleksandra
    Tariki, Nadia
    Vavenko, Tetiana
    2016 THIRD INTERNATIONAL SCIENTIFIC-PRACTICAL CONFERENCE PROBLEMS OF INFOCOMMUNICATIONS SCIENCE AND TECHNOLOGY (PIC S&T), 2016, : 223 - 226