Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling

被引:11
作者
Gustavson, Fred G. [1 ]
Karlsson, Lars [2 ]
Kagstrom, Bo [2 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Umea Univ, Dept Comp Sci & HPC2N, SE-90187 Umea, Sweden
来源
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 2009年 / 36卷 / 02期
基金
瑞典研究理事会;
关键词
Algorithms; Performance; Real symmetric matrices; positive definite matrices; Cholesky factorization; distributed square block format; packed storage; parallel computing; parallel algorithms;
D O I
10.1145/1499096.1499100
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The minimal block storage Distributed Square Block Packed (DSBP) format for distributed memory computing on symmetric and triangular matrices is presented. Three algorithm variants ( Basic, Static, and Dynamic) of the blocked right-looking Cholesky factorization are designed for the DSBP format, implemented, and evaluated. On our target machine, all variants outperform standard full-storage implementations while saving almost half the storage. Communication overhead is shown to be virtually eliminated by the Static and Dynamic variants, both of which take advantage of hardware parallelism to hide communication costs. The Basic variant is shown to yield comparable or slightly better performance than the full-storage ScaLAPACK routine PDPOTRF while clearly outperformed by both Static and Dynamic. Models of execution assuming zero communication costs and overhead are developed. For medium-and larger-sized problems, the Static schedule is near optimal on our target machine based on comparisons with these models and measurements of synchronization overhead.
引用
收藏
页数:25
相关论文
共 25 条
[1]   A HIGH-PERFORMANCE MATRIX-MULTIPLICATION ALGORITHM ON A DISTRIBUTED-MEMORY PARALLEL COMPUTER, USING OVERLAPPED COMMUNICATION [J].
AGARWAL, RC ;
GUSTAVSON, FG ;
ZUBAIR, M .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1994, 38 (06) :673-681
[2]  
AGARWAL RC, 1988, ASPECTS COMPUTATION, P217
[3]  
AGARWAL RC, 1989, SUPERCOMPUTING 89, P225
[4]  
[Anonymous], 1995, MPI MESSAGE PASSING
[5]  
BABOULIN M, 2005, TRPA0530 CERFACS
[6]   A parallel distributed solver for large dense symmetric systems: Applications to geodesy and electromagnetism problems [J].
Baboullin, M ;
Giraud, L ;
Gratton, S .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2005, 19 (04) :353-363
[7]  
BRENT RP, 1982, 82H521 TR CORN U DEP
[8]  
Brightwell Ron., 2004, Proceedings of the 18th annual international conference on Supercomputing, P298
[9]  
BUTTARI A, 2007, UTCS07600
[10]  
Chan E, 2007, SPAA'07: PROCEEDINGS OF THE NINETEENTH ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P116