Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations

被引:16
作者
Anzt, Hartwig [1 ,2 ]
Tsai, Yuhsiang M. [1 ]
Abdelfattah, Ahmad [2 ]
Cojean, Terry [1 ]
Dongarra, Jack [2 ,3 ,4 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
[2] Univ Tennessee, Knoxville, TN 37996 USA
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
[4] Univ Manchester, Manchester, Lancs, England
来源
PROCEEDINGS OF 2020 IEEE/ACM PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2020) | 2020年
关键词
Sparse Linear Algebra; Sparse Matrix Vector Product; Batched Linear Algebra; NVIDIA A100 GPU; CHALLENGES;
D O I
10.1109/PMBS51919.2020.00009
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPU accelerators have become an important back-bone for scientific high performance-computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server-line GPU, the A100 architecture, part of the Ampere generation. Specifically, we assess its performance for sparse and batch computations, as these routines are relied upon in many scientific applications, and compare to the performance achieved on NVIDIA's previous server-line GPU.
引用
收藏
页码:26 / 38
页数:13
相关论文
共 28 条
[1]   Performance, Design, and Autotuning of Batched GEMM for GPUs [J].
Abdelfattah, Ahmad ;
Haidar, Azzam ;
Tomov, Stanimire ;
Dongarra, Jack .
HIGH PERFORMANCE COMPUTING, 2016, 9697 :21-38
[2]   Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures [J].
Abdelfattah, Ahmad ;
Haidar, Azzam ;
Tomov, Stanimire ;
Dongarra, Jack .
JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 :226-236
[3]   Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures [J].
Abdelfattah, Ahmad ;
Haidar, Azzam ;
Tomov, Stanimire ;
Dongarra, Jack .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 :606-615
[4]   A 3D Non-Stationary Cluster Channel Model for Human Activity Recognition [J].
Abdelgawwad, Ahmed ;
Patzold, Matthias .
2019 IEEE 89TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-SPRING), 2019,
[5]  
[Anonymous], 2012, Google's PageRank and Beyond: The Science of Search Engine Rankings
[6]  
Anzt H., 2020, GINKGO MODERN LINEAR
[7]  
Anzt H., 2014, EECS14727 U TENN
[8]   Load-balancing Sparse Matrix Vector Product Kernels on GPUs [J].
Anzt, Hartwig ;
Cojean, Terry ;
Chen, Yen-Chen ;
Dongarra, Jack ;
Flegar, Goran ;
Nayak, Pratik ;
Tomov, Stanimire ;
Tsai, Yuhsiang M. ;
Wang, Weichung .
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2020, 7 (01)
[9]   Preconditioned Krylov solvers on GPUs [J].
Anzt, Hartwig ;
Gates, Mark ;
Dongarra, Jack ;
Kreutzer, Moritz ;
Wellein, Gerhard ;
Koehler, Martin .
PARALLEL COMPUTING, 2017, 68 :32-44
[10]   Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning [J].
Anzt, Hartwig ;
Dongarra, Jack ;
Flegar, Goran ;
Quintana-Orti, Enrique S. ;
Tomas, Andres E. .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 :1783-1792