New Level-3 BLAS Kernels for Cholesky Factorization

被引:0
|
作者
Gustavson, Fred G.
Wasniewski, Jerzy
Herrero, Jose R.
机构
来源
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I | 2012年 / 7203卷
关键词
ALGORITHMS; BLOCKING; MATRIX;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Some Linear Algebra Libraries use Level-2 routines during the factorization part of any Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3, a new type of BLAS, for the factorization part of a block Cholesky factorization algorithm for use by LAPACK routine DPOTRF or for BPF (Blocked Packed Format) Cholesky factorization. The four routines DPOTF3 are Fortran routines. Our main result is that performance of routines DPOTF3 is still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts to decrease. This means that the performance of DGEMM, DSYRK, and DTRSM will increase due to their use of larger block sizes and also to making less passes over the matrix elements. We present corroborating performance results for DPOTF3 versus DPOTF2 on a variety of common platforms. The four DPOTF3 routines are based on simple register blocking; different platforms have different numbers of registers and so our four routines have different register blockings. Blocked Packed Format (BPF) is discussed. LAPACK routines for _POTRF and _PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK _POTRF source codes. Upper BPF is shown to be identical to square block packed format. Performance results for DBPTRF and DPOTRF for large n show that routines DPOTF3 does increase performance for large n.
引用
收藏
页码:60 / 69
页数:10
相关论文
共 50 条
  • [31] Columnwise block LU factorization using BLAS kernels on VAX 6520/2VP
    Vasconcelos, PB
    dAlmeida, FD
    COMPUTING SYSTEMS IN ENGINEERING, 1995, 6 (4-5): : 423 - 429
  • [32] New rigorous perturbation bounds for the generalized Cholesky factorization
    Li, Hanyu
    Yang, Yanfei
    APPLIED MATHEMATICS AND COMPUTATION, 2015, 259 : 668 - 675
  • [33] Parallel implementation of BLAS: general techniques for Level 3 BLAS
    Chtchelkanova, A
    Gunnels, J
    Morrow, G
    Overfelt, J
    VandeGeijn, RA
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1997, 9 (09): : 837 - 857
  • [34] TECHNIQUE, LEVEL-3 - BASTIEN,J
    不详
    CLAVIER, 1988, 27 (02): : 33 - 33
  • [35] The STAR level-3 trigger system
    Lange, JS
    Adler, C
    Barger, J
    Demello, M
    Flierl, D
    Landgraf, J
    LeVine, MJ
    Ljubicic, A
    Nelson, J
    Roehrich, D
    Schambach, JJ
    Schmischke, D
    Schulz, MW
    Stock, R
    Struck, C
    Yepes, P
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2000, 453 (1-2): : 397 - 404
  • [36] NOTE ON ABELIAN SCHEMES OF LEVEL-3
    VANDERGEER, G
    MATHEMATISCHE ANNALEN, 1987, 278 (1-4) : 401 - 408
  • [37] The STAR Level-3 trigger system
    Adler, C
    Berger, J
    Demello, M
    Dietel, T
    Flierl, D
    Landgraf, J
    Lange, JS
    LeVine, MJ
    Ljubicic, A
    Nelson, J
    Roehrich, D
    Stock, R
    Struck, C
    Yepes, P
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2003, 499 (2-3): : 778 - 791
  • [38] NEW EFFICIENT AND ROBUST HSS CHOLESKY FACTORIZATION OF SPD MATRICES
    Li, Shengguo
    Gu, Ming
    Wu, Cinna Julie
    Xia, Jianlin
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2012, 33 (03) : 886 - 904
  • [39] A BLAS-3 version of the QR factorization with column pivoting
    Quintana-Orti, G
    Sun, IB
    Bischof, CH
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 19 (05): : 1486 - 1494
  • [40] PERFORMANCE OF THE CDF LEVEL-3 TRIGGER
    CARROLL, JT
    JOSHI, U
    AUCHINCLOSS, P
    DEVLIN, T
    FLAUGHER, B
    HU, P
    WATTS, T
    RAGAN, K
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 1990, 289 (03): : 606 - 609