XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

被引:3
作者
Gautier, Thierry [1 ]
Lima, Joao V. F. [2 ]
机构
[1] UCL, CNRS, INRIA, LIP Lab, Lyon, France
[2] Univ Fed Santa Maria, Grad Program Comp Sci, Santa Maria, RS, Brazil
来源
2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年
关键词
Multi-GPU; BLAS; Task Parallelism; DENSE LINEAR ALGEBRA;
D O I
10.1109/PDP50117.2020.00008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 kernels showed that XKBlas outperformed most implementations including the overhead of dynamic task's creation and scheduling. XKBlas outperformed BLAS implementations such as cuBLAS-XT, PaRSEC, BLASX and Chameleon/StarPU.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 49 条
  • [21] A PCISPH implementation using distributed multi-GPU acceleration for simulating industrial engineering applications
    Verma, Kevin
    McCabe, Christopher
    Peng, Chong
    Wille, Robert
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2020, 34 (04) : 450 - 464
  • [22] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
    Lei Zeng
    Shadi G. Alawneh
    Seyed Ali. Arefifar
    [J]. Cluster Computing, 2024, 27 : 1125 - 1136
  • [23] Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems
    Boratto, Murilo
    Alonso, Pedro
    Gimenez, Domingo
    Lastovetsky, Alexey
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (01) : 227 - 239
  • [24] Multi-GPU performance optimization of a computational fluid dynamics code using OpenACC
    Xue, Weicheng
    Roy, Christoper J.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05)
  • [25] Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems
    Young, Vinson
    Jaleel, Aamer
    Bolotin, Evgeny
    Ebrahimi, Eiman
    Nellans, David
    Villa, Oreste
    [J]. 2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 339 - 351
  • [26] Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems
    Murilo Boratto
    Pedro Alonso
    Domingo Giménez
    Alexey Lastovetsky
    [J]. The Journal of Supercomputing, 2017, 73 : 227 - 239
  • [27] Simulation of Information Propagation over Complex Networks: Performance Studies on Multi-GPU
    Jin, Jiangming
    Turner, Stephen John
    Lee, Bu-Sung
    Zhong, Jianlong
    He, Bingsheng
    [J]. 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2013), 2013, : 179 - 188
  • [28] FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
    Zhai, Yujia
    Giem, Elisabeth
    Zhao, Kai
    Liu, Jinyang
    Huang, Jiajun
    Wong, Bryan M.
    Shelton, Christian R.
    Chen, Zizhong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3207 - 3223
  • [29] Multi-GPU Accelerated Admittance Method for High-Resolution Human Exposure Evaluation
    Xiong, Zubiao
    Feng, Shi
    Kautz, Richard
    Chandra, Sandeep
    Altunyurt, Nevin
    Chen, Ji
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2015, 62 (12) : 2920 - 2930
  • [30] MULTI-GPU PARALLEL IMPLEMENTATION OF SPATIAL-SPECTRAL KERNEL SPARSE REPRESENTATION FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Deng, Weishi
    Wu, Zebin
    Ma, Haoyang
    Wang, Qicong
    Sua, Jin
    Xu, Yang
    Yang, Jiandong
    Wei, Zhihui
    Liu, Hongyi
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 517 - 520