Reducing inter-process communication overhead in parallel sparse matrix-matrix multiplication

被引：1

作者：

Ahmed M.S. ^{[1
]}

Houser J. ^{[1
]}

Hoque M.A. ^{[1
]}

Raju R. ^{[2
]}

Pfeiffer P. ^{[1
]}

机构：

[1] East Tennessee State University, Johnson City, TN

[2] University of Houston, Houston, TX

来源：

International Journal of Grid and High Performance Computing | 2017年 / 9卷 / 03期

关键词：

Communication Overhead; MPI Communication; Parallel Computing; Performance Analysis; Scalability; Sparse Matrix-Matrix Multiplication;

D O I：

10.4018/IJGHPC.2017070104

中图分类号：

学科分类号：

摘要：

Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results. Copyright © 2017, IGI Global.

引用

页码：46 / 59

页数：13

共 16 条

[11]

Kruskal C.P., Rudolph L., Snir M., Techniques for parallel manipulation of sparse matrices, Theoretical Computer Science, 64, 2, pp. 135-157, (1989)

[12]

Oak Ridge National Laboratory, (1943)

[13]

Utrera G., Gil M., Martorell X., Evaluating the Performance Impact of Communication. Imbalance in Sparse Matrix-Vector Multiplication, Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 321-328, (2015)

[14]

Van Dongen S., Graph clustering via a discrete uncoupling process, SIAM Journal on Matrix Analysis and Applications, 30, 1, pp. 121-141, (2008)

[15]

VandeVondele J., Borstnik U., Hutter J., Linear scaling self-consistent field calculations with millions of atoms in the condensed phase, Journal of Chemical Theory and Computation, 8, 10, pp. 3565-3573, (2012)

[16]

Williams V.V., Multiplying matrices faster than Coppersmith-Winograd, Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp. 887-898, (2012)

← 1 2 →