A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

被引:13
作者
Choi, J
机构
来源
HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS | 1997年
关键词
D O I
10.1109/HPC.1997.592151
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.
引用
收藏
页码:224 / 229
页数:6
相关论文
empty
未找到相关数据