XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server

被引：3

作者：

Gautier, Thierry ^{[1
]}

Lima, Joao V. F. ^{[2
]}

机构：

[1] UCL, CNRS, INRIA, LIP Lab, Lyon, France

[2] Univ Fed Santa Maria, Grad Program Comp Sci, Santa Maria, RS, Brazil

来源：

2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年

关键词：

Multi-GPU; BLAS; Task Parallelism; DENSE LINEAR ALGEBRA;

D O I：

10.1109/PDP50117.2020.00008

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the last ten years, GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance. This paper presents XKBlas that aims to improve performance of BLAS-3 kernels on multi-GPU systems. At low level, we model computation as a set of tasks accessing data on different resources. At high level, the API design favors non-blocking calls as uniform concept to overlap latency, even by fine grain computation. Unit benchmark of BLAS-3 kernels showed that XKBlas outperformed most implementations including the overhead of dynamic task's creation and scheduling. XKBlas outperformed BLAS implementations such as cuBLAS-XT, PaRSEC, BLASX and Chameleon/StarPU.

引用

页码：1 / 8

页数：8

共 49 条

[1] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
Chen, Yi
Chen, Su
Jiang, Hai
INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
[2] Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms
Gautier, Thierry
Lima, Joao V. F.
SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 12 - 22
[3] Towards a Multi-GPU Implementation of a Seismic Application
Rigon, Pedro H. C.
Schussler, Brenda S.
Padoin, Edson L.
Lorenzon, Arthur F.
Carissimi, Alexandre
Navaux, Philippe O. A.
HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
[4] Statistical Modeling of Power/Energy of Scientific Kernels on a Multi-GPU system
Ghosh, Sayan
Chandrasekaran, Sunita
Chapman, Barbara
2013 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2013,
[5] High Performance Single and Multi-GPU Acceleration for Diffuse Optical Tomography
Saikia, Manob Jyoti
Kanhirodan, Rajan
2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1320 - 1323
[6] Multi-GPU implementation of a VMAT treatment plan optimization algorithm
Tian, Zhen
Peng, Fei
Folkerts, Michael
Tan, Jun
Jia, Xun
Jiang, Steve B.
MEDICAL PHYSICS, 2015, 42 (06) : 2841 - 2852
[7] Multi-GPU Implementation of the Uniformization Method for Solving Markov Models
Karwacki, Marek
Bylina, Beata
Bylina, Jaroslaw
2012 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2012, : 533 - 537
[8] Multi-GPU Implementation of k-Nearest Neighbor Algorithm
Masek, Jan
Burget, Kadim
Karasek, Jan
Uher, Vaclav
Dutta, Malay Kishore
2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015, : 764 - 767
[9] Performance Optimization of Allreduce Operation for Multi-GPU Systems
Nukada, Akira
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
[10] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
Rohr, David
Bach, Matthias
Kretz, Matthias
Lindenstruth, Volker
IEEE MICRO, 2011, 31 (05) : 18 - 26

← 1 2 3 4 5 →