A Dynamic Execution Neural Network Processor for Fine-Grained Mixed-Precision Model Training Based on Online Quantization Sensitivity Analysis

被引:1
作者
Liu, Ruoyang [1 ]
Wei, Chenhan [1 ]
Yang, Yixiong [2 ]
Wang, Wenxun [1 ]
Yuan, Binbin [3 ]
Yang, Huazhong [1 ]
Liu, Yongpan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Nvidia, Shanghai 201210, Peoples R China
[3] Traff Control Technol Co Ltd, Beijing 100160, Peoples R China
关键词
Training; Artificial neural networks; Quantization (signal); Process control; Tensors; System-on-chip; Memory management; Dynamic precision (DP); fully quantized network training; low-bit training; mixed-precision quantization; neural network (NN) training accelerator;
D O I
10.1109/JSSC.2024.3377292
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As neural network (NN) training cost red has been growing exponentially over the past decade, developing high-speed and energy-efficient training methods has become an urgent task. Fine-grained mixed-precision low-bit training is the most promising way for high-efficiency training, but it needs dedicated processor designs to overcome the overhead in control, storage, and I/O and remove the power bottleneck in floating-point (FP) units. This article presents a dynamic execution NN processor supporting fine-grained mixed-precision training through an online quantization sensitivity analysis. Three key features are proposed: the quantization-sensitivity-aware dynamic execution controller, dynamic bit-width adaptive datapath design, and the low-power multi-level-aligned block-FP unit (BFPU). This chip achieves 13.2-TFLOPS/W energy efficiency and 1.07-TFLOPS/mm(2) area efficiency.
引用
收藏
页码:3082 / 3093
页数:12
相关论文
共 37 条
[31]  
TorchVision maintainers and contributors, 2016, TorchVision: PyTorch's Computer Vision library
[32]  
Wang Y., 2021, PROC S VLSI CIRCUITS, P1
[33]  
WEI YJ, 2020, IEEE CUST INTEGR CIR, pNI175, DOI DOI 10.1109/cicc48029.2020.9075910
[34]  
Yang YX, 2022, ASIA S PACIF DES AUT, P442, DOI 10.1109/ASP-DAC52403.2022.9712505
[35]  
Yao ZW, 2021, PR MACH LEARN RES, V139
[36]   FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding [J].
Zhang, Sai Qian ;
McDanel, Bradley ;
Kung, H. T. .
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, :846-860
[37]  
Zhou S., 2016, arXiv