Level-3 BLAS on a GPU: Picking the Low Hanging Fruit

被引:4
|
作者
Igual, Francisco D. [1 ]
Quintana-Orti, Gregorio [1 ]
van de Geijn, Robert A. [2 ]
机构
[1] Univ Jaume 1, Depto Ingn & Ciencia Comp, Castellon de La Plana 12071, Spain
[2] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009 (ICCMSE 2009) | 2012年 / 1504卷
关键词
D O I
10.1063/1.4772121
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The arrival of hardware accelerators has created a new gold rush to be the first to deliver their promise of high performance for numerical applications. Despite the recent advances in programmability, it is still hard to develop tuned programs that extract all the potential performance promised by the manufacturers. In this paper we remind the community that while this development effort is a noble endeavor, there is a lot of low hanging fruit that can be harvested easily. Picking this low hanging fruit benefits the scientific computing community immediately and prototypes the approach that further optimizations may follow. We demonstrate this by focusing on a widely used set of operations, the level-3 BLAS, targeting the NVIDIA GPUs.
引用
收藏
页码:1109 / 1112
页数:4
相关论文
共 50 条
  • [1] STABILITY OF BLOCK ALGORITHMS WITH FAST LEVEL-3 BLAS
    DEMMEL, JW
    HIGHAM, NJ
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1992, 18 (03): : 274 - 291
  • [2] High-performance implementation of the level-3 BLAS
    Goto, Kazushige
    Van De Geijn, Robert
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (01): : 1 - 14
  • [3] New Level-3 BLAS Kernels for Cholesky Factorization
    Gustavson, Fred G.
    Wasniewski, Jerzy
    Herrero, Jose R.
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 60 - 69
  • [4] Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms
    Gautier, Thierry
    Lima, Joao V. F.
    SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 12 - 22
  • [5] THE GREENING OF MEDICINE Start by picking low hanging fruit
    Bickley, Philip
    BRITISH MEDICAL JOURNAL, 2012, 344
  • [6] A Linear Algebra Core Design For Efficient Level-3 BLAS
    Pedram, Ardavan
    Gilani, Syed Zohaib
    Kim, Nam Sung
    van de Geijn, Robert
    Schulte, Michael
    Gerstlauer, Andreas
    2012 IEEE 23RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2012, : 149 - 152
  • [7] Increasing data locality and introducing Level-3 BLAS in the Neville elimination
    Alonso, Pedro
    Cortina, Raquel
    Quintana-Orti, Enrique S.
    Ranilla, Jose
    APPLIED MATHEMATICS AND COMPUTATION, 2011, 218 (07) : 3348 - 3358
  • [8] Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix
    Schwarz, Angelika
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2022, 48 (03):
  • [9] A PARALLEL BLOCK IMPLEMENTATION OF LEVEL-3 BLAS FOR MIMD VECTOR PROCESSORS
    DAYDE, MJ
    DUFF, IS
    PETITET, A
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1994, 20 (02): : 178 - 193
  • [10] IMPLEMENTATION OF THE BLAS LEVEL-3 AND LINPACK BENCHMARK ON THE AP1000
    BRENT, RP
    STRAZDINS, PE
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1993, 29 (01): : 61 - 70