Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers

被引:8
作者
Faverge, Mathieu [1 ]
Herrmann, Julien [2 ]
Langou, Julien [3 ]
Lowery, Bradley [3 ]
Robert, Yves [1 ,4 ]
Dongarra, Jack [4 ]
机构
[1] Univ Bordeaux, CNRS, Inria, Bordeaux INP,UMR 5800, Talence, France
[2] Ecole Normale Super Lyon, Lab LIP, Lyon, France
[3] Univ Colorado Denver, Denver, CO USA
[4] Univ Tennessee Knoxville, Knoxville, TN USA
基金
俄罗斯科学基金会; 美国国家科学基金会;
关键词
Numerical algorithms; LU factorization; QR factorization; Stability; Performance; PARALLEL;
D O I
10.1016/j.jpdc.2015.06.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper introduces hybrid LU-QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. However, LU steps are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR step when a robustness criterion detects some risk for instability, and they execute an LU step otherwise. The choice between LU and QR steps must have a small computational overhead and must provide a satisfactory level of stability with as few QR steps as possible. In this paper, we introduce several robustness criteria and we establish upper bounds on the growth factor of the norm of the updated matrix incurred by each of these criteria. In addition, we describe the implementation of the hybrid algorithms through an extension of the PaRSEC software to allow for dynamic choices during execution. Finally, we analyze both stability and performance results compared to state-of-the-art linear solvers on parallel distributed multicore platforms. A comprehensive set of experiments shows that hybrid LU-QR algorithms provide a continuous range of trade-offs between stability and performances. (C) 2015 Published by Elsevier Inc.
引用
收藏
页码:32 / 46
页数:15
相关论文
共 23 条
  • [1] Amestoy P., 2013, COMMUNICATION
  • [2] [Anonymous], 16 INT WORKSH HIGH L
  • [3] [Anonymous], P ACM IEEE SC08 C
  • [4] [Anonymous], 2002, Accuracy and stability of numerical algorithms
  • [5] Accelerating Linear System Solutions Using Randomization Techniques
    Baboulin, Marc
    Dongarra, Jack
    Herrmann, Julien
    Tomov, Stanimire
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2013, 39 (02):
  • [6] Bosilca G., 2011, 12 IEEE INT WORKSH P
  • [7] DAGuE: A generic distributed DAG engine for High Performance Computing
    Bosilca, George
    Bouteiller, Aurelien
    Danalis, Anthony
    Herault, Thomas
    Lemarinier, Pierre
    Dongarra, Jack
    [J]. PARALLEL COMPUTING, 2012, 38 (1-2) : 37 - 51
  • [8] Bouwmeester H., 2011, P ACM IEEE SC11 C
  • [9] Efficient algorithms for all-to-all communications in multiport message-passing systems
    Bruck, J
    Ho, CT
    Kipnis, S
    Upfal, E
    Weathersby, D
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (11) : 1143 - 1156
  • [10] Parallel tiled QR factorization for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (13) : 1573 - 1590