New Level-3 BLAS Kernels for Cholesky Factorization

被引:0
|
作者
Gustavson, Fred G.
Wasniewski, Jerzy
Herrero, Jose R.
机构
来源
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I | 2012年 / 7203卷
关键词
ALGORITHMS; BLOCKING; MATRIX;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Some Linear Algebra Libraries use Level-2 routines during the factorization part of any Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3, a new type of BLAS, for the factorization part of a block Cholesky factorization algorithm for use by LAPACK routine DPOTRF or for BPF (Blocked Packed Format) Cholesky factorization. The four routines DPOTF3 are Fortran routines. Our main result is that performance of routines DPOTF3 is still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts to decrease. This means that the performance of DGEMM, DSYRK, and DTRSM will increase due to their use of larger block sizes and also to making less passes over the matrix elements. We present corroborating performance results for DPOTF3 versus DPOTF2 on a variety of common platforms. The four DPOTF3 routines are based on simple register blocking; different platforms have different numbers of registers and so our four routines have different register blockings. Blocked Packed Format (BPF) is discussed. LAPACK routines for _POTRF and _PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK _POTRF source codes. Upper BPF is shown to be identical to square block packed format. Performance results for DBPTRF and DPOTRF for large n show that routines DPOTF3 does increase performance for large n.
引用
收藏
页码:60 / 69
页数:10
相关论文
共 50 条
  • [1] Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
    Gustavson, Fred G.
    Wasniewski, Jerzy
    Dongarra, Jack J.
    Herrero, Jose R.
    Langou, Julien
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2013, 39 (02):
  • [2] USE OF LEVEL-3 BLAS KERNELS IN THE SOLUTION OF FULL SPARSE LINEAR-EQUATIONS
    AMESTOY, PR
    DAYDE, MJ
    DUFF, IS
    HIGH PERFORMANCE COMPUTING /, 1989, : 19 - 31
  • [3] A parallel Cholesky factorization routine with a new version of PB-BLAS
    Choi, J
    Moon, SH
    1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 52 - 58
  • [4] STABILITY OF BLOCK ALGORITHMS WITH FAST LEVEL-3 BLAS
    DEMMEL, JW
    HIGHAM, NJ
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1992, 18 (03): : 274 - 291
  • [5] High-performance implementation of the level-3 BLAS
    Goto, Kazushige
    Van De Geijn, Robert
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01): : 1 - 14
  • [6] Cholesky factorization of band matrices using multithreaded BLAS
    Remon, Alfredo
    Quintana-Orti, Enrique S.
    Quintana-Orti, Gregorio
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2007, 4699 : 608 - +
  • [7] PERFORMANCE OF PARALLEL CHOLESKY FACTORIZATION ALGORITHMS USING BLAS
    LUECKE, GR
    YUN, JH
    SMITH, PW
    JOURNAL OF SUPERCOMPUTING, 1992, 6 (3-4): : 315 - 329
  • [8] Level-3 BLAS on a GPU: Picking the Low Hanging Fruit
    Igual, Francisco D.
    Quintana-Orti, Gregorio
    van de Geijn, Robert A.
    INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009 (ICCMSE 2009), 2012, 1504 : 1109 - 1112
  • [9] A Linear Algebra Core Design For Efficient Level-3 BLAS
    Pedram, Ardavan
    Gilani, Syed Zohaib
    Kim, Nam Sung
    van de Geijn, Robert
    Schulte, Michael
    Gerstlauer, Andreas
    2012 IEEE 23RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2012, : 149 - 152
  • [10] Increasing data locality and introducing Level-3 BLAS in the Neville elimination
    Alonso, Pedro
    Cortina, Raquel
    Quintana-Orti, Enrique S.
    Ranilla, Jose
    APPLIED MATHEMATICS AND COMPUTATION, 2011, 218 (07) : 3348 - 3358