Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers

被引：8

作者：

Faverge, Mathieu ^{[1
]}

Herrmann, Julien ^{[2
]}

Langou, Julien ^{[3
]}

Lowery, Bradley ^{[3
]}

Robert, Yves ^{[1
,4
]}

Dongarra, Jack ^{[4
]}

机构：

[1] Univ Bordeaux, CNRS, Inria, Bordeaux INP,UMR 5800, Talence, France

[2] Ecole Normale Super Lyon, Lab LIP, Lyon, France

[3] Univ Colorado Denver, Denver, CO USA

[4] Univ Tennessee Knoxville, Knoxville, TN USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2015年 / 85卷

基金：

俄罗斯科学基金会; 美国国家科学基金会;

关键词：

Numerical algorithms; LU factorization; QR factorization; Stability; Performance; PARALLEL;

D O I：

10.1016/j.jpdc.2015.06.007

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper introduces hybrid LU-QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. However, LU steps are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR step when a robustness criterion detects some risk for instability, and they execute an LU step otherwise. The choice between LU and QR steps must have a small computational overhead and must provide a satisfactory level of stability with as few QR steps as possible. In this paper, we introduce several robustness criteria and we establish upper bounds on the growth factor of the norm of the updated matrix incurred by each of these criteria. In addition, we describe the implementation of the hybrid algorithms through an extension of the PaRSEC software to allow for dynamic choices during execution. Finally, we analyze both stability and performance results compared to state-of-the-art linear solvers on parallel distributed multicore platforms. A comprehensive set of experiments shows that hybrid LU-QR algorithms provide a continuous range of trade-offs between stability and performances. (C) 2015 Published by Elsevier Inc.

引用

页码：32 / 46

页数：15

共 23 条

[1] Amestoy P., 2013, COMMUNICATION
[2] [Anonymous], 16 INT WORKSH HIGH L
[3] [Anonymous], P ACM IEEE SC08 C
[4] [Anonymous], 2002, Accuracy and stability of numerical algorithms
[5] Accelerating Linear System Solutions Using Randomization Techniques
Baboulin, Marc
Dongarra, Jack
Herrmann, Julien
Tomov, Stanimire
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2013, 39 (02):
[6] Bosilca G., 2011, 12 IEEE INT WORKSH P
[7] DAGuE: A generic distributed DAG engine for High Performance Computing
Bosilca, George
Bouteiller, Aurelien
Danalis, Anthony
Herault, Thomas
Lemarinier, Pierre
Dongarra, Jack
[J]. PARALLEL COMPUTING, 2012, 38 (1-2) : 37 - 51
[8] Bouwmeester H., 2011, P ACM IEEE SC11 C
[9] Efficient algorithms for all-to-all communications in multiport message-passing systems
Bruck, J
Ho, CT
Kipnis, S
Upfal, E
Weathersby, D
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (11) : 1143 - 1156
[10] Parallel tiled QR factorization for multicore architectures
Buttari, Alfredo
Langou, Julien
Kurzak, Jakub
Dongarra, Jack
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (13) : 1573 - 1590

← 1 2 3 →