QR factorization for shared memory and message passing

被引:1
作者
Dunn, IN [1 ]
Meyer, GGL [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
关键词
QR factorization; message passing systems; performance evaluation; shared memory systems; Givens rotations;
D O I
10.1016/S0167-8191(02)00162-X
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes the design, implementation, and performance of three new parallel QR factorization algorithms: shared memory, synchronous message passing, and asynchronous message passing. In contrast to existing parallel algorithms, the multiprocessor partitioning strategy is not governed by an underlying static data distribution scheme. Rather, a dynamic distribution strategy is employed to improve scalability on small problems. Experiments conducted on a 128-processor SGI Origin 2000 and a 64-processor HP SPP-2000 show that the new algorithms have a lower execution time than available tuned parallel routines installed on the machines including a version of ScaLAPACK's distributed QR factorization algorithm PDGEQRF. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1507 / 1530
页数:24
相关论文
共 18 条
[1]  
Blackford L. S., 1997, ScaLAPACK user's guide
[2]   A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition [J].
James J. Carrig ;
Gerard G.L. Meyer .
Advances in Computational Mathematics, 1999, 10 (1) :97-113
[3]  
CHOI J, 1994, TM12470 ORNL
[4]   Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs [J].
Darte, A ;
Vivien, F .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 1997, 25 (06) :447-496
[5]  
Golub G. H., 2013, Matrix Computations
[6]   A framework for efficient data redistribution on distributed memory multicomputers [J].
Guo, MY ;
Nakata, I .
JOURNAL OF SUPERCOMPUTING, 2001, 20 (03) :243-265
[7]  
Hwang K., 1993, Advanced Computer Architecture: Parallelism. Scalability
[8]   GRAIN-SIZE DETERMINATION FOR PARALLEL PROCESSING [J].
KRUATRACHUE, B ;
LEWIS, T .
IEEE SOFTWARE, 1988, 5 (01) :23-32
[9]   Maximizing parallelism and minimizing synchronization with affine partitions [J].
Lim, AW ;
Lam, MS .
PARALLEL COMPUTING, 1998, 24 (3-4) :445-475
[10]  
*MESS PASS INT FOR, 1997, MPI MESS PASS INT ST