A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

被引:30
作者
Oh, Jinwook [1 ]
Lee, Sae Kyu [1 ]
Kang, Mingu [1 ]
Ziegler, Matthew [1 ]
Silberman, Joel [1 ]
Agrawal, Ankur [1 ]
Venkataramani, Swagath [1 ]
Fleischer, Bruce [1 ]
Guillorn, Michael [1 ]
Choi, Jungwook [1 ]
Wang, Wei [2 ]
Mueller, Silvia [3 ]
Ben-Yehuda, Shimon [4 ]
Bonanno, James [5 ]
Cao, Nianzheng [1 ]
Casatuta, Robert [6 ]
Chen, Chia-Yu [1 ]
Cohen, Matt [1 ]
Erez, Ophir [4 ]
Fox, Thomas [1 ]
Gristede, George [1 ]
Haynie, Howard [5 ]
Ivanov, Vicktoria [4 ]
Koswatta, Siyu [1 ]
Lo, Shih-Hsien [1 ]
Lutz, Martin [1 ]
Maier, Gary [6 ]
Mesh, Alex [4 ]
Nustov, Yevgeny [4 ]
Rider, Scot [5 ]
Schaal, Marcel [1 ]
Scheuermann, Michael [1 ]
Sun, Xiao [1 ]
Wang, Naigang [1 ]
Yee, Fanchieh [1 ]
Zhou, Ching [1 ]
Shah, Vinay [7 ]
Curran, Brian [5 ]
Srinivasan, Vijayalakshmi [1 ]
Lu, Pong-Fei [1 ]
Shukla, Sunil [1 ]
Gopalakrishnan, Kailash [1 ]
Chang, Leland [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM TJ Watson Res Ctr, Albany, NY USA
[3] IBM Syst Grp, Boblingen, Germany
[4] IBM Syst Grp, Haifa, Israel
[5] IBM Syst Grp, Poughkeepsie, NY USA
[6] IBM Syst Grp, Hopewell Jct, NY USA
[7] IBM Syst Grp, Hursley, England
来源
2020 IEEE SYMPOSIUM ON VLSI CIRCUITS | 2020年
关键词
D O I
10.1109/vlsicircuits18222.2020.9162917
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.
引用
收藏
页数:2
相关论文
共 8 条
[1]  
Agrawal A., 2019, ARITH
[2]  
[Anonymous], 2019, ISSCC
[3]  
[Anonymous], 2020, ISCA
[4]  
Fleischer B., 2018, VLSI S
[5]  
NVIDIA, TESL PROD REP
[6]  
NVIDIA, V100 WHIT PAP
[7]  
Sakr Charbel, 2019, 7 INT C LEARN REPR I
[8]   DEEPTOOLS: Compiler and Execution Runtime Extensions for RAPiD AI Accelerator [J].
Venkataramani, Swagath ;
Choi, Jungwook ;
Srinivasan, Vijayalakshmi ;
Wang, Wei ;
Zhang, Jintao ;
Schaal, Marcel ;
Serrano, Mauricio J. ;
Ishizaki, Kazuaki ;
Inoue, Hiroshi ;
Ogawa, Eri ;
Ohara, Motiyoshi ;
Chang, Leland ;
Gopalakrishnan, Kailash .
IEEE MICRO, 2019, 39 (05) :102-111