Reducing inter-process communication overhead in parallel sparse matrix-matrix multiplication

被引:1
作者
Ahmed M.S. [1 ]
Houser J. [1 ]
Hoque M.A. [1 ]
Raju R. [2 ]
Pfeiffer P. [1 ]
机构
[1] East Tennessee State University, Johnson City, TN
[2] University of Houston, Houston, TX
关键词
Communication Overhead; MPI Communication; Parallel Computing; Performance Analysis; Scalability; Sparse Matrix-Matrix Multiplication;
D O I
10.4018/IJGHPC.2017070104
中图分类号
学科分类号
摘要
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results. Copyright © 2017, IGI Global.
引用
收藏
页码:46 / 59
页数:13
相关论文
共 16 条
[1]  
Ballard G., Buluc A., Demmel J., Grigori L., Lipshitz B., Schwartz O., Toledo S., Communication optimal parallel multiplication of sparse random matrices, Proceedings of the Twentyfifthannual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 222-231, (2013)
[2]  
Ballard G., Demmel J., Holtz O., Lipshitz B., Schwartz O., Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 193-204, (2012)
[3]  
Ballard G., Druinsky A., Knight N., Schwartz O., Brief announcement: Hypergraph partitioning for parallel sparse matrix-matrix multiplication, Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 86-88, (2015)
[4]  
Briggs W.L., Henson V.E., McCormick S.F., A Multigrid Tutorial, (2000)
[5]  
Buluc A., Gilbert J.R., Challenges and advances in parallel sparse matrix-matrix multiplication, Proceedings of the 37th International Conference on Parallel Processing ICPP'08, pp. 503-510, (2008)
[6]  
Challacombe M., A general parallel sparse-blocked matrix multiply for linear scaling SCF theory, Computer Physics Communications, 128, 1-2, pp. 93-107, (2000)
[7]  
Gilbert J.R., Reinhardt S., Shah V.B., A unified framework for numerical and combinatorial computing, Computing in Science & Engineering, 10, 2, pp. 20-25, (2008)
[8]  
High Performance Computing Center, (2007)
[9]  
Hoque M.A., Raju M.R.K., Tymczak C.J., Vrinceanu D., Chilakamarri K., Parallel sparse matrixmatrix multiplication: A scalable solution with 1D algorithm, International Journal on Computer Science and Engineering, 11, 4, pp. 391-401, (2015)
[10]  
Hsu C.H., Chen T.L., Yang C.T., Chu H.C., Scheduling contention-free broadcasts in heterogeneous networks, International Journal of Communication Systems, 28, 5, pp. 952-971, (2015)