Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution

被引:8
作者
DeFlumere, Ashley [1 ]
Lastovetsky, Alexey [1 ]
Becker, Brett A. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci & Informat, Dublin 4, Ireland
来源
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW) | 2012年
关键词
Parallel Matrix Multiplication; Matrix Partitioning; Heterogeneous Computing; High Performance Computing; LINEAR ALGEBRA PROBLEMS; COMPUTATIONS; NETWORKS;
D O I
10.1109/IPDPSW.2012.12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of matrix partitioning for parallel matrix-matrix multiplication on heterogeneous processors has been extensively studied since the mid 1990s. During this time, previous research focused mainly on the design of efficient partitioning algorithms, optimally or sub-optimally partitioning matrices into rectangles. The optimality of the rectangular partitioning shape itself has never been studied or even seriously questioned. The accepted approach is that consideration of non-rectangular shapes will not significantly improve the optimality of the solution, but can significantly complicate the partitioning problem, which is already NP-complete even for the restricted case of rectangular shapes. There is no published research, however, supporting this approach. The shape of the globally optimal partitioning, and how the best rectangular partitioning compares with this global optimum, are still wide open problems. Solution of these problems will decide if new partitioning algorithms searching for truly optimal, and not necessarily rectangular, solutions are needed. This paper presents the first results of our research on the problem of optimal partitioning shapes for parallel matrix-matrix multiplication on heterogeneous processors. Namely, the case of two interconnected processors is comprehensively studied. We prove that, depending on performance characteristics of the processors and the communication link, the globally optimal partitioning will have one of just two well-specified shapes, one of which is rectangular and the other is non-rectangular. The theoretical analysis is conducted using an original mathematical technique proposed in the paper. It is shown that the technique can also be applied in the case of arbitrary numbers of processors. While comprehensive analysis of the cases of three and more processors is more complicated and the subject for future work, the paper does prove the optimality of some particular non-rectangular partitioning shapes for some combinations of performance characteristics of heterogeneous processors and communication links. The paper also presents experimental results demonstrating that the optimal non-rectangular partitioning can significantly outperform the optimal rectangular one on real-life heterogeneous HPC platforms.
引用
收藏
页码:125 / 139
页数:15
相关论文
共 37 条
  • [31] CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication
    Huang, Hua
    Chow, Edmond
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [32] FuPerMod: A Framework for Optimal Data Partitioning for Parallel Scientific Applications on Dedicated Heterogeneous HPC Platforms
    Clarke, David
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2013), 2013, 7979 : 182 - 196
  • [33] Mondriaan sparse matrix partitioning for attacking cryptosystems by a parallel block Lanczos algorithm - a case study
    Bisseling, Rob H.
    Flesch, Ildiko
    PARALLEL COMPUTING, 2006, 32 (7-8) : 551 - 567
  • [34] Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms
    Gu, Rong
    Tang, Yun
    Tian, Chen
    Zhou, Hucheng
    Li, Guanru
    Zheng, Xudong
    Huang, Yihua
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) : 2539 - 2552
  • [35] High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication
    Yzelman, Albert-Jan Nicholas
    Roose, Dirk
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 116 - 125
  • [36] Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies
    Uçar, B
    Aykanat, C
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2004, 25 (06) : 1837 - 1859
  • [37] Parallel optical coherent dot-product architecture for large- scale matrix multiplication with compatibility for diverse phase shifters
    Xu, Shaofu
    Wang, Jing
    Yi, Sicheng
    Zhao, Xinrui
    Liu, Binshuo
    Shao, Jiayi
    Zou, Weiwen
    OPTICS EXPRESS, 2022, 30 (23) : 42057 - 42068