A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication

被引：2

作者：

Rasouli, Majid ^{[1
]}

Kirby, Robert M. ^{[1
]}

Sundar, Hari ^{[1
]}

机构：

[1] Univ Utah, Sch Comp, Salt Lake City, UT 84112 USA

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2021) | 2020年

基金：

美国国家科学基金会;

关键词：

sparse matrix product; dense matrix multiplication; GEMM; algebraic multigrid; AMG; numerical linear algebra; parallel computing; matrix compression; Golomb-Rice encoding; IMPLEMENTATION;

D O I：

10.1145/3432261.3432271

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.

引用

页码：110 / 119

页数：10

共 17 条

[1] [Anonymous], 2000, GRAPH CLUSTERING FLO
[2] Parallel Triangle Counting and Enumeration using Matrix Algebra
Azad, Ariful
Buluc, Aydin
Gilbert, John
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 804 - 811
[3] Balay S., 2017, PETSc web page
[4] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
Buluc, Aydin
Gilbert, John R.
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) : C170 - C191
[5] The Combinatorial BLAS: design, implementation, and applications
Buluc, Aydin
Gilbert, John R.
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (04) : 496 - 509
[6] The University of Florida Sparse Matrix Collection
Davis, Timothy A.
Hu, Yifan
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
[7] BLACK-BOX MULTIGRID
DENDY, JE
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 1982, 48 (03) : 366 - 386
[8] A unified framework for numerical and combinatorial computing
Gilbert, John R.
Shah, Viral B.
ReinhaRdt, Steve
[J]. COMPUTING IN SCIENCE & ENGINEERING, 2008, 10 (02) : 20 - 25
[9] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
Gremse, Felix
Hoefter, Andreas
Schwen, Lars Ole
Kiessling, Fabian
Naumann, Uwe
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01) : C54 - C71
[10] Kepner J, 2011, SOFTW ENVIRON TOOLS, V22, P1, DOI 10.1137/1.9780898719918

← 1 2 →