A Dynamic Execution Neural Network Processor for Fine-Grained Mixed-Precision Model Training Based on Online Quantization Sensitivity Analysis

被引：1

作者：

Liu, Ruoyang ^{[1
]}

Wei, Chenhan ^{[1
]}

Yang, Yixiong ^{[2
]}

Wang, Wenxun ^{[1
]}

Yuan, Binbin ^{[3
]}

Yang, Huazhong ^{[1
]}

Liu, Yongpan ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Nvidia, Shanghai 201210, Peoples R China

[3] Traff Control Technol Co Ltd, Beijing 100160, Peoples R China

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2024年 / 59卷 / 09期

关键词：

Training; Artificial neural networks; Quantization (signal); Process control; Tensors; System-on-chip; Memory management; Dynamic precision (DP); fully quantized network training; low-bit training; mixed-precision quantization; neural network (NN) training accelerator;

D O I：

10.1109/JSSC.2024.3377292

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As neural network (NN) training cost red has been growing exponentially over the past decade, developing high-speed and energy-efficient training methods has become an urgent task. Fine-grained mixed-precision low-bit training is the most promising way for high-efficiency training, but it needs dedicated processor designs to overcome the overhead in control, storage, and I/O and remove the power bottleneck in floating-point (FP) units. This article presents a dynamic execution NN processor supporting fine-grained mixed-precision training through an online quantization sensitivity analysis. Three key features are proposed: the quantization-sensitivity-aware dynamic execution controller, dynamic bit-width adaptive datapath design, and the low-power multi-level-aligned block-FP unit (BFPU). This chip achieves 13.2-TFLOPS/W energy efficiency and 1.07-TFLOPS/mm(2) area efficiency.

引用

页码：3082 / 3093

页数：12

共 37 条

[31]

TorchVision maintainers and contributors, 2016, TorchVision: PyTorch's Computer Vision library

[32]

Wang Y., 2021, PROC S VLSI CIRCUITS, P1

[33]

WEI YJ, 2020, IEEE CUST INTEGR CIR, pNI175, DOI DOI 10.1109/cicc48029.2020.9075910

[34]

Yang YX, 2022, ASIA S PACIF DES AUT, P442, DOI 10.1109/ASP-DAC52403.2022.9712505

[35]

Yao ZW, 2021, PR MACH LEARN RES, V139

[36] FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding [J].

Zhang, Sai Qian ;

McDanel, Bradley ;

Kung, H. T. .

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, :846-860

[37]

Zhou S., 2016, arXiv

← 1 2 3 4 →