Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

被引:2
|
作者
Quintin, Jean-Noel [1 ]
Vigneras, Pierre [1 ]
机构
[1] Atos, Campus Teratec,2 Rue Piquetterie, F-91680 Bruyeres Le Chatel, France
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Fabric Management; Routing; Fault-Tolerant Routing; BXI; Interconnect Management; High Performance Computing; TOPOLOGY;
D O I
10.1109/CLUSTER.2015.135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
引用
收藏
页码:793 / 800
页数:8
相关论文
共 50 条
  • [31] Effective Solution for Scalability and Productivity Improvement in Fault-Tolerant Routing
    Lemeshko, Oleksandr
    Arous, Kinan
    Tariki, Nadia
    2015 SECOND INTERNATIONAL SCIENTIFIC-PRACTICAL CONFERENCE PROBLEMS OF INFOCOMMUNICATIONS SCIENCE AND TECHNOLOGY (PIC S&T 2015), 2015, : 76 - 78
  • [32] A Comprehensive Review of Fault-Tolerant Routing Mechanisms for the Internet of Things
    Lan, Zhengxin
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (07) : 1083 - 1093
  • [33] DFTR: Dynamic Fault-Tolerant Routing protocol for Convergecast WSNs
    Chalhoub, Gerard
    Tall, Hamadoun
    Wang, Jinpeng
    Misson, Michel
    2017 IEEE 86TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL), 2017,
  • [34] XY Based Fault-Tolerant Routing with The Passage of Faulty Nodes
    Kurokawa, Yota
    Fukushi, Masaru
    2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2018), 2018, : 99 - 104
  • [35] Fault-Tolerant IP Routing Flow-Based Model
    Yeremenko, Oleksandra
    Tariki, Nadia
    Hailan, Ahmad M.
    2016 13TH INTERNATIONAL CONFERENCE ON MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE (TCSET), 2016, : 655 - 657
  • [36] Algorithms for fault-tolerant routing in circuit-switched networks
    Bagchi, Amitabha
    Chaudhary, Amitabh
    Scheideler, Christian
    Kolman, Petr
    SIAM JOURNAL ON DISCRETE MATHEMATICS, 2007, 21 (01) : 141 - 157
  • [37] An improved fault-tolerant routing algorithm in meshes with convex faults
    Chang, HH
    Chiu, GM
    PARALLEL COMPUTING, 2002, 28 (01) : 133 - 149
  • [38] Dynamic Reliability Analysis Model for Fault-tolerant Network Routing
    Wang Bin
    Wu Chunming
    Yang Qiang
    Qian Yaguan
    Wang Xiaonan
    CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03): : 500 - 504
  • [39] A degradable NoC router for the improvement of fault-tolerant routing performance
    Fukushi, Masaru
    Katsuta, Toshihiro
    Kurokawa, Yota
    ARTIFICIAL LIFE AND ROBOTICS, 2020, 25 (02) : 301 - 307
  • [40] Throughput Considerations of Fault-Tolerant Routing in Network-on-Chip
    Rezazadeh, Arshin
    Fathy, Mahmood
    CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 : 81 - 92