A Dynamic Execution Neural Network Processor for Fine-Grained Mixed-Precision Model Training Based on Online Quantization Sensitivity Analysis

被引:1
作者
Liu, Ruoyang [1 ]
Wei, Chenhan [1 ]
Yang, Yixiong [2 ]
Wang, Wenxun [1 ]
Yuan, Binbin [3 ]
Yang, Huazhong [1 ]
Liu, Yongpan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Nvidia, Shanghai 201210, Peoples R China
[3] Traff Control Technol Co Ltd, Beijing 100160, Peoples R China
关键词
Training; Artificial neural networks; Quantization (signal); Process control; Tensors; System-on-chip; Memory management; Dynamic precision (DP); fully quantized network training; low-bit training; mixed-precision quantization; neural network (NN) training accelerator;
D O I
10.1109/JSSC.2024.3377292
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As neural network (NN) training cost red has been growing exponentially over the past decade, developing high-speed and energy-efficient training methods has become an urgent task. Fine-grained mixed-precision low-bit training is the most promising way for high-efficiency training, but it needs dedicated processor designs to overcome the overhead in control, storage, and I/O and remove the power bottleneck in floating-point (FP) units. This article presents a dynamic execution NN processor supporting fine-grained mixed-precision training through an online quantization sensitivity analysis. Three key features are proposed: the quantization-sensitivity-aware dynamic execution controller, dynamic bit-width adaptive datapath design, and the low-power multi-level-aligned block-FP unit (BFPU). This chip achieves 13.2-TFLOPS/W energy efficiency and 1.07-TFLOPS/mm(2) area efficiency.
引用
收藏
页码:3082 / 3093
页数:12
相关论文
共 37 条
[21]  
Lin Jessy, 2022, arXiv
[22]  
Liu Ruoyang, 2023, ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference, P1
[23]   Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems [J].
O'Connor, Mike ;
Chatterjee, Niladrish ;
Lee, Donghyuk ;
Wilson, John ;
Agrawal, Aditya ;
Keckler, Stephen W. ;
Dally, William J. .
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, :41-54
[24]   A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference [J].
Oh, Jinwook ;
Lee, Sae Kyu ;
Kang, Mingu ;
Ziegler, Matthew ;
Silberman, Joel ;
Agrawal, Ankur ;
Venkataramani, Swagath ;
Fleischer, Bruce ;
Guillorn, Michael ;
Choi, Jungwook ;
Wang, Wei ;
Mueller, Silvia ;
Ben-Yehuda, Shimon ;
Bonanno, James ;
Cao, Nianzheng ;
Casatuta, Robert ;
Chen, Chia-Yu ;
Cohen, Matt ;
Erez, Ophir ;
Fox, Thomas ;
Gristede, George ;
Haynie, Howard ;
Ivanov, Vicktoria ;
Koswatta, Siyu ;
Lo, Shih-Hsien ;
Lutz, Martin ;
Maier, Gary ;
Mesh, Alex ;
Nustov, Yevgeny ;
Rider, Scot ;
Schaal, Marcel ;
Scheuermann, Michael ;
Sun, Xiao ;
Wang, Naigang ;
Yee, Fanchieh ;
Zhou, Ching ;
Shah, Vinay ;
Curran, Brian ;
Srinivasan, Vijayalakshmi ;
Lu, Pong-Fei ;
Shukla, Sunil ;
Gopalakrishnan, Kailash ;
Chang, Leland .
2020 IEEE SYMPOSIUM ON VLSI CIRCUITS, 2020,
[25]  
Redmon J, 2018, Arxiv, DOI [arXiv:1804.02767, DOI 10.48550/ARXIV.1804.02767]
[26]   High-Resolution Image Synthesis with Latent Diffusion Models [J].
Rombach, Robin ;
Blattmann, Andreas ;
Lorenz, Dominik ;
Esser, Patrick ;
Ommer, Bjoern .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10674-10685
[27]   Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks [J].
Sharma, Hardik ;
Park, Jongse ;
Suda, Naveen ;
Lai, Liangzhen ;
Chau, Benson ;
Chandra, Vikas ;
Esmaeilzadeh, Hadi .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :764-775
[28]   DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration [J].
Song, Zhuoran ;
Fu, Bangqi ;
Wu, Feiyang ;
Jiang, Zhaoming ;
Jiang, Li ;
Jing, Naifeng ;
Liang, Xiaoyao .
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, :1010-1021
[29]  
Sun X., 2019, Adv.Neural Inf. Process. Syst., P4900
[30]  
Sun Xiao., 2020, Advances in Neural Information Processing Systems, V33, P1796