Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

被引:24
作者
Rothberg, E [1 ]
机构
[1] INTEL CORP, SUPERCOMP SYST DIV, BEAVERTON, OR 97006 USA
关键词
sparse Cholesky factorization; parallel machines; sparse matrices; scalability;
D O I
10.1137/S106482759426715X
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sparse Cholesky factorization has historically achieved extremely low performance on distributed-memory multiprocessors. We believe that three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. This paper demonstrates that all three of these issues have in fact already been addressed. Specifically, (1) single node performance can be improved by moving from a column-oriented approach, where the computational kernel is level 1 BLAS, to either a panel- or block-oriented approach, where the computational kernel is level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers (the Intel Paragon system) providing one to two orders of magnitude higher communication bandwidth than previous parallel computers (the Intel iPSC/860 system); and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision Mflops on 32 nodes of the Intel Paragon system, 1 Chop on 64 nodes, and 1.7 Chops on 128 nodes. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization.
引用
收藏
页码:699 / 713
页数:15
相关论文
共 21 条
[1]   VECTORIZATION OF A MULTIPROCESSOR MULTIFRONTAL CODE [J].
AMESTOY, PR ;
DUFF, IS .
INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1989, 3 (03) :41-59
[2]   THE INFLUENCE OF RELAXED SUPERNODE PARTITIONS ON THE MULTIFRONTAL METHOD [J].
ASHCRAFT, C ;
GRIMES, R .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1989, 15 (04) :291-309
[3]  
ASHCRAFT C, 1990, YALEUDCSRR810 YAL U
[4]  
ASHCRAFT CC, 1987, INT J SUPERCOMPUT AP, V1, P10
[5]  
DONGARRA JJ, 1990, ACM T MATH SOFTWARE, V16, P1, DOI 10.1145/77626.79170
[6]   SPARSE-MATRIX TEST PROBLEMS [J].
DUFF, IS ;
GRIMES, RG ;
LEWIS, JG .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1989, 15 (01) :1-14
[7]   COMMUNICATION RESULTS FOR PARALLEL SPARSE CHOLESKY FACTORIZATION ON A HYPERCUBE [J].
GEORGE, A ;
LIU, JWH ;
NG, E .
PARALLEL COMPUTING, 1989, 10 (03) :287-298
[8]  
GEORGE A, 1988, TM10865 ORNL
[9]  
GEORGE A, 1981, COMPUTER SOLUTION LA
[10]  
GILBERT J, 1991, SIAM J SCI STAT COMP, V12, P1184