Parallel tiled QR factorization for multicore architectures

被引:80
|
作者
Buttari, Alfredo [1 ]
Langou, Julien [2 ]
Kurzak, Jakub [1 ]
Dongarra, Jack [1 ,3 ,4 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37916 USA
[2] Univ Colorado, Dept Math Sci, Denver, CO 80202 USA
[3] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA
[4] Univ Manchester, Manchester, Lancs, England
来源
关键词
multicore; linear algebra; QR factorization;
D O I
10.1002/cpe.1301
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to he reformulated or new algorithms have to he developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can he represented as a sequence of small tasks that operate on square blocks of data (referred to as 'tiles'). These tasks can he dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright (E) 2008 John Wiley & Sons, Ltd.
引用
收藏
页码:1573 / 1590
页数:18
相关论文
共 50 条
  • [31] Parallel construction of wavelet trees on multicore architectures
    José Fuentes-Sepúlveda
    Erick Elejalde
    Leo Ferres
    Diego Seco
    Knowledge and Information Systems, 2017, 51 : 1043 - 1066
  • [32] Parallel construction of wavelet trees on multicore architectures
    Fuentes-Sepulveda, Jose
    Elejalde, Erick
    Ferres, Leo
    Seco, Diego
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 1043 - 1066
  • [33] A Multithreaded Algorithm for Sparse Cholesky Factorization on Hybrid Multicore Architectures
    Tang, Meng
    Gadou, Mohamed
    Ranka, Sanjay
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 616 - 625
  • [34] PARALLEL QR FACTORIZATION OF BLOCK-TRIDIAGONAL MATRICES
    Buttari, Alfredo
    Hauberg, Soren
    Kodsi, Costy
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (06): : C313 - C334
  • [36] A PARALLEL QR FACTORIZATION ALGORITHM WITH CONTROLLED LOCAL PIVOTING
    BISCHOF, CH
    SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1991, 12 (01): : 36 - 57
  • [37] A PARALLEL QR-FACTORIZATION/SOLVER OF QUASISEPARABLE MATRICES
    Vandebril, Raf
    Van Barel, Marc
    Mastronardi, Nicola
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2008, 30 : 144 - 167
  • [38] Succinct parallel Lempel–Ziv factorization on a multicore computer
    Ling Bo Han
    Bin Lao
    Ge Nong
    The Journal of Supercomputing, 2022, 78 : 7278 - 7303
  • [39] Extending SRT for Parallel Applications in Tiled-CMP Architectures
    Sanchez, Daniel
    Aragon, Juan L.
    Garcia, Jose M.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1352 - 1359
  • [40] Scheduling of QR factorization algorithms on SMP and multi-core architectures
    Quintana-Orti, Gregorio
    Quintana-Orti, Enrique S.
    Chan, Ernie
    de Geijn, Robert A. van
    Van Zee, Field G.
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 301 - +