High performance RDMA based all-to-all broadcast for InfiniBand clusters

被引：0

作者：

Sur, S ^{[1
]}

Bondhugula, UKR ^{[1
]}

Mamidala, A ^{[1
]}

Jin, HW ^{[1
]}

Panda, DK ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

HIGH PERFORMANCE COMPUTING - HIPC 2005, PROCEEDINGS | 2005年 / 3769卷

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The All-to-all broadcast collective operation is essential for many parallel scientific applications. This collective operation is called MPI-Allgather in the context of MPI. Contemporary MPI software stacks implement this collective on top of MPI point-to-point calls leading to several performance overheads. In this paper, we propose a design of All-to-All broadcast using the Remote Direct Memory Access (RDMA) feature offered by InfiniBand, an emerging high performance interconnect. Our RDMA based design eliminates the overheads associated with existing designs. Our results indicate that latency of the All-to-all Broadcast operation can be reduced by 30% for 32 processes and a message size of 32 KB. In addition, our design can improve the latency by a factor of 4.75 under no buffer reuse conditions for the same process count and message size. Further, our design can improve performance of a parallel matrix multiplication algorithm by 37% on eight processes, while multiplying a 256x256 matrix.

引用

页码：148 / 157

页数：10

共 50 条

[1] All-to-all broadcast on switch-based clusters of workstations
Jacunski, M
Sadayappan, P
Panda, DK
IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 325 - 329
[2] All-to-all broadcast on switch-based clusters of workstations
Ohio State Univ, Columbus, United States
Proc Int Parall Process Symp IPPS, (325-329):
[3] Efficient and scalable All-to-All Personalized Exchange for InfiniBand-based clusters
Sur, S
Jin, HW
Panda, DK
2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2004, : 275 - 282
[4] Bandwidth Efficient All-to-All Broadcast on Switched Clusters
Ahmad Faraj
Pitch Patarasuk
Xin Yuan
International Journal of Parallel Programming, 2008, 36 : 426 - 453
[5] Bandwidth efficient all-to-all broadcast on switched clusters
Faraj, Ahmad
Patarasuk, Pitch
Yuan, Xin
2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2006, : 153 - +
[6] Bandwidth efficient all-to-all broadcast on switched clusters
Faraj, Ahmad
Patarasuk, Pitch
Yuan, Xin
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2008, 36 (04) : 426 - 453
[7] A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters
Venkatesh, A.
Subramoni, H.
Hamidouche, K.
Panda, Dhabaleswar K.
2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
[8] Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over InfiniBand
Xu, Shulei
Kuncham, Goutham Kalikrishna Reddy
Abduljabbar, Mustafa
Subramoni, Hari
Panda, Dhabaleswar K.
2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 41 - 50
[9] Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM
Yu, WK
Buntinas, D
Panda, DK
2004 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2004, : 125 - 134
[10] On general results for all-to-all broadcast
Chen, MS
Chen, JC
Yu, PS
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (04) : 363 - 370

← 1 2 3 4 5 →