XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

被引:3
|
作者
Gautier, Thierry [1 ]
Lima, Joao V. F. [2 ]
机构
[1] UCL, CNRS, INRIA, LIP Lab, Lyon, France
[2] Univ Fed Santa Maria, Grad Program Comp Sci, Santa Maria, RS, Brazil
来源
2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年
关键词
Multi-GPU; BLAS; Task Parallelism; DENSE LINEAR ALGEBRA;
D O I
10.1109/PDP50117.2020.00008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 kernels showed that XKBlas outperformed most implementations including the overhead of dynamic task's creation and scheduling. XKBlas outperformed BLAS implementations such as cuBLAS-XT, PaRSEC, BLASX and Chameleon/StarPU.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 49 条
  • [1] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
    Chen, Yi
    Chen, Su
    Jiang, Hai
    INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
  • [2] Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms
    Gautier, Thierry
    Lima, Joao V. F.
    SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 12 - 22
  • [3] Towards a Multi-GPU Implementation of a Seismic Application
    Rigon, Pedro H. C.
    Schussler, Brenda S.
    Padoin, Edson L.
    Lorenzon, Arthur F.
    Carissimi, Alexandre
    Navaux, Philippe O. A.
    HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
  • [4] Statistical Modeling of Power/Energy of Scientific Kernels on a Multi-GPU system
    Ghosh, Sayan
    Chandrasekaran, Sunita
    Chapman, Barbara
    2013 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2013,
  • [5] High Performance Single and Multi-GPU Acceleration for Diffuse Optical Tomography
    Saikia, Manob Jyoti
    Kanhirodan, Rajan
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1320 - 1323
  • [6] Multi-GPU implementation of a VMAT treatment plan optimization algorithm
    Tian, Zhen
    Peng, Fei
    Folkerts, Michael
    Tan, Jun
    Jia, Xun
    Jiang, Steve B.
    MEDICAL PHYSICS, 2015, 42 (06) : 2841 - 2852
  • [7] Multi-GPU Implementation of the Uniformization Method for Solving Markov Models
    Karwacki, Marek
    Bylina, Beata
    Bylina, Jaroslaw
    2012 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2012, : 533 - 537
  • [8] Multi-GPU Implementation of k-Nearest Neighbor Algorithm
    Masek, Jan
    Burget, Kadim
    Karasek, Jan
    Uher, Vaclav
    Dutta, Malay Kishore
    2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015, : 764 - 767
  • [9] Performance Optimization of Allreduce Operation for Multi-GPU Systems
    Nukada, Akira
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
  • [10] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
    Rohr, David
    Bach, Matthias
    Kretz, Matthias
    Lindenstruth, Volker
    IEEE MICRO, 2011, 31 (05) : 18 - 26