Parallel tiled QR factorization for multicore architectures

被引:80
|
作者
Buttari, Alfredo [1 ]
Langou, Julien [2 ]
Kurzak, Jakub [1 ]
Dongarra, Jack [1 ,3 ,4 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37916 USA
[2] Univ Colorado, Dept Math Sci, Denver, CO 80202 USA
[3] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA
[4] Univ Manchester, Manchester, Lancs, England
来源
关键词
multicore; linear algebra; QR factorization;
D O I
10.1002/cpe.1301
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to he reformulated or new algorithms have to he developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can he represented as a sequence of small tasks that operate on square blocks of data (referred to as 'tiles'). These tasks can he dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright (E) 2008 John Wiley & Sons, Ltd.
引用
收藏
页码:1573 / 1590
页数:18
相关论文
共 50 条
  • [41] TACO: A scheduling scheme for parallel applications on multicore architectures
    Schoenherr, Jan H.
    Juurlink, Ben
    Richling, Jan
    SCIENTIFIC PROGRAMMING, 2014, 22 (03) : 223 - 237
  • [42] New Parallel Sparse Direct Solvers for Multicore Architectures
    Hogg, Jonathan
    Scott, Jennifer
    ALGORITHMS, 2013, 6 (04) : 702 - 725
  • [43] Fully Flexible Parallel Merge Sort for Multicore Architectures
    Marszalek, Zbigniew
    Wozniak, Marcin
    Polap, Dawid
    COMPLEXITY, 2018,
  • [44] ADAPTIVE EXECUTION OF SOFTWARE SYSTEMS ON PARALLEL MULTICORE ARCHITECTURES
    Rauber, Thomas
    Ruenger, Gudula
    ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 3: INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION, 2010, : 191 - 198
  • [45] EFFICIENT PARALLEL NONNEGATIVE LEAST SQUARES ON MULTICORE ARCHITECTURES
    Luo, Yuancheng
    Duraiswami, Ramani
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2848 - 2863
  • [46] Parallel QR Factorization of Block Low-rank Matrices
    Apriansyah, M. Ridwan
    Yokota, Rio
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2022, 48 (03):
  • [47] Parallel compact WY QR factorization for asynchronous message passing
    Dunn, IN
    Meyer, GGL
    CONFERENCE PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, 2002, : 17 - 24
  • [48] PARALLEL QR FACTORIZATION USING THE TORUS-WRAP MAPPING
    HENDRICKSON, B
    PARALLEL COMPUTING, 1993, 19 (11) : 1259 - 1271
  • [49] Parallel Out-of-Core computation and updating of the QR factorization
    Gunter, BC
    Van De Geijn, RA
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2005, 31 (01): : 60 - 78
  • [50] New parallel (rank-revealing) QR factorization algorithms
    da Cunha, RD
    Becker, D
    Patterson, JC
    EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 677 - 686