Parallel tiled QR factorization for multicore architectures

被引:80
|
作者
Buttari, Alfredo [1 ]
Langou, Julien [2 ]
Kurzak, Jakub [1 ]
Dongarra, Jack [1 ,3 ,4 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37916 USA
[2] Univ Colorado, Dept Math Sci, Denver, CO 80202 USA
[3] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA
[4] Univ Manchester, Manchester, Lancs, England
来源
关键词
multicore; linear algebra; QR factorization;
D O I
10.1002/cpe.1301
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to he reformulated or new algorithms have to he developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can he represented as a sequence of small tasks that operate on square blocks of data (referred to as 'tiles'). These tasks can he dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright (E) 2008 John Wiley & Sons, Ltd.
引用
收藏
页码:1573 / 1590
页数:18
相关论文
共 50 条
  • [1] Parallel tiled QR factorization for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 639 - +
  • [2] THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES
    Bylina, Beata
    Bylina, Jaroslaw
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2019, 29 (02) : 407 - 419
  • [3] A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures
    Agullo, Emmanuel
    Dongarra, Jack
    Nath, Rajib
    Tomov, Stanimire
    EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 194 - 205
  • [4] Multifrontal QR Factorization for Multicore Architectures over Runtime Systems
    Agullo, Emmanuel
    Buttari, Alfredo
    Guermouche, Abdou
    Lopez, Florent
    EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 521 - 532
  • [5] A class of parallel tiled linear algebra algorithms for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    PARALLEL COMPUTING, 2009, 35 (01) : 38 - 53
  • [6] A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures
    Baboulin, Marc
    Becker, Dulceneia
    Dongarra, Jack
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 14 - 24
  • [7] PARALLEL SPARSE QR FACTORIZATION ON SHARED-MEMORY ARCHITECTURES
    MATSTOMS, P
    PARALLEL COMPUTING, 1995, 21 (03) : 473 - 486
  • [8] Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures
    Nishioka, Yusuke
    Taura, Kenjiro
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1178 - 1184
  • [9] COMPLEXITY OF PARALLEL QR FACTORIZATION
    COSNARD, M
    ROBERT, Y
    JOURNAL OF THE ACM, 1986, 33 (04) : 712 - 723
  • [10] COMPLEXITY OF PARALLEL QR FACTORIZATION
    COSNARD, M
    ROBERT, Y
    COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 1983, 297 (02): : 137 - 139