Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

被引:2
作者
Quintin, Jean-Noel [1 ]
Vigneras, Pierre [1 ]
机构
[1] Atos, Campus Teratec,2 Rue Piquetterie, F-91680 Bruyeres Le Chatel, France
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Fabric Management; Routing; Fault-Tolerant Routing; BXI; Interconnect Management; High Performance Computing; TOPOLOGY;
D O I
10.1109/CLUSTER.2015.135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
引用
收藏
页码:793 / 800
页数:8
相关论文
共 32 条
[2]  
Ahn Jung Ho, 2009, P C HIGH PERF COMP N, P1
[3]  
[Anonymous], P 11 INT PAR PROC S
[4]  
[Anonymous], 2006, SC 2006
[5]  
[Anonymous], 20 INT PAR DISTR PRO
[6]  
[Anonymous], SHORTEST PATHS ALGOR
[7]  
[Anonymous], 26 ANN S FDN COMP SC
[8]  
[Anonymous], IPDPS MIAM 2008 P 22
[9]  
[Anonymous], 1971, SHORT INTRO ART PROG
[10]  
[Anonymous], P 9 INT PAR PROC S