NVIDIA A100 Tensor Core GPU: Performance and Innovation

被引:216
作者
Choquette, Jack [1 ]
Gandhi, Wishwesh [2 ]
Giroux, Olivier [1 ]
Stam, Nick [1 ]
Krashinsky, Ronny [1 ]
机构
[1] NVIDIA, Singapore 609927, Singapore
[2] NVIDIA, Architecture, Singapore 609927, Singapore
关键词
A100; C++20; CUDA; Deep Learning; GPU; NVLink; Tensor Core;
D O I
10.1109/MM.2021.3061394
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
NVIDIA A100 Tensor Core GPU is NVIDIA's latest flagship GPU. It has been designed with many new innovative features to provide performance and capabilities for HPC, AI, and data analytics workloads. Feature enhancements include a Third-Generation Tensor Core, new asynchronous data movement and programming model, enhanced L2 cache, HBM2 DRAM, and third-generation NVIDIA NVLink I/O. © 1981-2012 IEEE.
引用
收藏
页码:29 / 35
页数:7
相关论文
共 5 条
[1]   Volta: Performance and Programmability [J].
Choquette, Jack ;
Giroux, Olivier ;
Foley, Denis .
IEEE MICRO, 2018, 38 (02) :42-52
[2]  
Ishii A., 2018, Hot Chips
[3]  
ISO, 2020, ISO/IEC 14882:2020 Information technology-Programming languages-C++
[4]  
Mattson P., 2019, MLPerf training benchmark
[5]  
Shirako J, 2008, ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, P277