A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication

被引:2
作者
Rasouli, Majid [1 ]
Kirby, Robert M. [1 ]
Sundar, Hari [1 ]
机构
[1] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA
来源
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2021) | 2020年
基金
美国国家科学基金会;
关键词
sparse matrix product; dense matrix multiplication; GEMM; algebraic multigrid; AMG; numerical linear algebra; parallel computing; matrix compression; Golomb-Rice encoding; IMPLEMENTATION;
D O I
10.1145/3432261.3432271
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 17 条
  • [1] [Anonymous], 2000, GRAPH CLUSTERING FLO
  • [2] Parallel Triangle Counting and Enumeration using Matrix Algebra
    Azad, Ariful
    Buluc, Aydin
    Gilbert, John
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 804 - 811
  • [3] Balay S., 2017, PETSc web page
  • [4] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
    Buluc, Aydin
    Gilbert, John R.
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) : C170 - C191
  • [5] The Combinatorial BLAS: design, implementation, and applications
    Buluc, Aydin
    Gilbert, John R.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (04) : 496 - 509
  • [6] The University of Florida Sparse Matrix Collection
    Davis, Timothy A.
    Hu, Yifan
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
  • [7] BLACK-BOX MULTIGRID
    DENDY, JE
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 1982, 48 (03) : 366 - 386
  • [8] A unified framework for numerical and combinatorial computing
    Gilbert, John R.
    Shah, Viral B.
    ReinhaRdt, Steve
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2008, 10 (02) : 20 - 25
  • [9] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
    Gremse, Felix
    Hoefter, Andreas
    Schwen, Lars Ole
    Kiessling, Fabian
    Naumann, Uwe
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01) : C54 - C71
  • [10] Kepner J, 2011, SOFTW ENVIRON TOOLS, V22, P1, DOI 10.1137/1.9780898719918