Parallel tiled QR factorization for multicore architectures

被引：80

作者：

Buttari, Alfredo ^{[1
]}

Langou, Julien ^{[2
]}

Kurzak, Jakub ^{[1
]}

Dongarra, Jack ^{[1
,3
,4
]}

机构：

[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37916 USA

[2] Univ Colorado, Dept Math Sci, Denver, CO 80202 USA

[3] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA

[4] Univ Manchester, Manchester, Lancs, England

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2008年 / 20卷 / 13期

关键词：

multicore; linear algebra; QR factorization;

D O I：

10.1002/cpe.1301

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to he reformulated or new algorithms have to he developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can he represented as a sequence of small tasks that operate on square blocks of data (referred to as 'tiles'). These tasks can he dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright (E) 2008 John Wiley & Sons, Ltd.

引用

页码：1573 / 1590

页数：18

共 50 条

[31] Parallel construction of wavelet trees on multicore architectures
José Fuentes-Sepúlveda
Erick Elejalde
Leo Ferres
Diego Seco
Knowledge and Information Systems, 2017, 51 : 1043 - 1066
[32] Parallel construction of wavelet trees on multicore architectures
Fuentes-Sepulveda, Jose
Elejalde, Erick
Ferres, Leo
Seco, Diego
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 1043 - 1066
[33] A Multithreaded Algorithm for Sparse Cholesky Factorization on Hybrid Multicore Architectures
Tang, Meng
Gadou, Mohamed
Ranka, Sanjay
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 616 - 625
[34] PARALLEL QR FACTORIZATION OF BLOCK-TRIDIAGONAL MATRICES
Buttari, Alfredo
Hauberg, Soren
Kodsi, Costy
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (06): : C313 - C334
[35] PARALLEL COMPLEXITIES AND COMPUTATIONS OF CHOLESKY DECOMPOSITION AND QR FACTORIZATION
DATTA, K
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1985, 18 (01) : 67 - 82
[36] A PARALLEL QR FACTORIZATION ALGORITHM WITH CONTROLLED LOCAL PIVOTING
BISCHOF, CH
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1991, 12 (01): : 36 - 57
[37] A PARALLEL QR-FACTORIZATION/SOLVER OF QUASISEPARABLE MATRICES
Vandebril, Raf
Van Barel, Marc
Mastronardi, Nicola
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2008, 30 : 144 - 167
[38] Succinct parallel Lempel–Ziv factorization on a multicore computer
Ling Bo Han
Bin Lao
Ge Nong
The Journal of Supercomputing, 2022, 78 : 7278 - 7303
[39] Extending SRT for Parallel Applications in Tiled-CMP Architectures
Sanchez, Daniel
Aragon, Juan L.
Garcia, Jose M.
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1352 - 1359
[40] Scheduling of QR factorization algorithms on SMP and multi-core architectures
Quintana-Orti, Gregorio
Quintana-Orti, Enrique S.
Chan, Ernie
de Geijn, Robert A. van
Van Zee, Field G.
PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 301 - +

← 1 2 3 4 5 →