Parallel tiled QR factorization for multicore architectures

被引：80

作者：

Buttari, Alfredo ^{[1
]}

Langou, Julien ^{[2
]}

Kurzak, Jakub ^{[1
]}

Dongarra, Jack ^{[1
,3
,4
]}

机构：

[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37916 USA

[2] Univ Colorado, Dept Math Sci, Denver, CO 80202 USA

[3] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA

[4] Univ Manchester, Manchester, Lancs, England

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2008年 / 20卷 / 13期

关键词：

multicore; linear algebra; QR factorization;

D O I：

10.1002/cpe.1301

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to he reformulated or new algorithms have to he developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can he represented as a sequence of small tasks that operate on square blocks of data (referred to as 'tiles'). These tasks can he dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright (E) 2008 John Wiley & Sons, Ltd.

引用

页码：1573 / 1590

页数：18

共 50 条

[41] TACO: A scheduling scheme for parallel applications on multicore architectures
Schoenherr, Jan H.
Juurlink, Ben
Richling, Jan
SCIENTIFIC PROGRAMMING, 2014, 22 (03) : 223 - 237
[42] New Parallel Sparse Direct Solvers for Multicore Architectures
Hogg, Jonathan
Scott, Jennifer
ALGORITHMS, 2013, 6 (04) : 702 - 725
[43] Fully Flexible Parallel Merge Sort for Multicore Architectures
Marszalek, Zbigniew
Wozniak, Marcin
Polap, Dawid
COMPLEXITY, 2018,
[44] ADAPTIVE EXECUTION OF SOFTWARE SYSTEMS ON PARALLEL MULTICORE ARCHITECTURES
Rauber, Thomas
Ruenger, Gudula
ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 3: INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION, 2010, : 191 - 198
[45] EFFICIENT PARALLEL NONNEGATIVE LEAST SQUARES ON MULTICORE ARCHITECTURES
Luo, Yuancheng
Duraiswami, Ramani
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2848 - 2863
[46] Parallel QR Factorization of Block Low-rank Matrices
Apriansyah, M. Ridwan
Yokota, Rio
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2022, 48 (03):
[47] Parallel compact WY QR factorization for asynchronous message passing
Dunn, IN
Meyer, GGL
CONFERENCE PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, 2002, : 17 - 24
[48] PARALLEL QR FACTORIZATION USING THE TORUS-WRAP MAPPING
HENDRICKSON, B
PARALLEL COMPUTING, 1993, 19 (11) : 1259 - 1271
[49] Parallel Out-of-Core computation and updating of the QR factorization
Gunter, BC
Van De Geijn, RA
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2005, 31 (01): : 60 - 78
[50] New parallel (rank-revealing) QR factorization algorithms
da Cunha, RD
Becker, D
Patterson, JC
EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 677 - 686

← 1 2 3 4 5 →