High performance RDMA based all-to-all broadcast for InfiniBand clusters

被引:0
|
作者
Sur, S [1 ]
Bondhugula, UKR [1 ]
Mamidala, A [1 ]
Jin, HW [1 ]
Panda, DK [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The All-to-all broadcast collective operation is essential for many parallel scientific applications. This collective operation is called MPI-Allgather in the context of MPI. Contemporary MPI software stacks implement this collective on top of MPI point-to-point calls leading to several performance overheads. In this paper, we propose a design of All-to-All broadcast using the Remote Direct Memory Access (RDMA) feature offered by InfiniBand, an emerging high performance interconnect. Our RDMA based design eliminates the overheads associated with existing designs. Our results indicate that latency of the All-to-all Broadcast operation can be reduced by 30% for 32 processes and a message size of 32 KB. In addition, our design can improve the latency by a factor of 4.75 under no buffer reuse conditions for the same process count and message size. Further, our design can improve performance of a parallel matrix multiplication algorithm by 37% on eight processes, while multiplying a 256x256 matrix.
引用
收藏
页码:148 / 157
页数:10
相关论文
共 50 条
  • [1] All-to-all broadcast on switch-based clusters of workstations
    Jacunski, M
    Sadayappan, P
    Panda, DK
    IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 325 - 329
  • [2] All-to-all broadcast on switch-based clusters of workstations
    Ohio State Univ, Columbus, United States
    Proc Int Parall Process Symp IPPS, (325-329):
  • [3] Efficient and scalable All-to-All Personalized Exchange for InfiniBand-based clusters
    Sur, S
    Jin, HW
    Panda, DK
    2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2004, : 275 - 282
  • [4] Bandwidth Efficient All-to-All Broadcast on Switched Clusters
    Ahmad Faraj
    Pitch Patarasuk
    Xin Yuan
    International Journal of Parallel Programming, 2008, 36 : 426 - 453
  • [5] Bandwidth efficient all-to-all broadcast on switched clusters
    Faraj, Ahmad
    Patarasuk, Pitch
    Yuan, Xin
    2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2006, : 153 - +
  • [6] Bandwidth efficient all-to-all broadcast on switched clusters
    Faraj, Ahmad
    Patarasuk, Pitch
    Yuan, Xin
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2008, 36 (04) : 426 - 453
  • [7] A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters
    Venkatesh, A.
    Subramoni, H.
    Hamidouche, K.
    Panda, Dhabaleswar K.
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [8] Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over InfiniBand
    Xu, Shulei
    Kuncham, Goutham Kalikrishna Reddy
    Abduljabbar, Mustafa
    Subramoni, Hari
    Panda, Dhabaleswar K.
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 41 - 50
  • [9] Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM
    Yu, WK
    Buntinas, D
    Panda, DK
    2004 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2004, : 125 - 134
  • [10] On general results for all-to-all broadcast
    Chen, MS
    Chen, JC
    Yu, PS
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (04) : 363 - 370