HLQ: Hardware-Friendly Logarithmic Quantization Aware Training for Power-Efficient Low-Precision CNN Models

被引：0

作者：

Choi, Dahun ^{[1
]}

Park, Juntae ^{[1
]}

Kim, Hyun ^{[1
]}

机构：

[1] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Dept Elect & Informat Engn, Seoul 01811, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Quantization (signal); Accuracy; Training; Propagation losses; Convolutional neural networks; Computational modeling; Power demand; Integrated circuit modeling; Hardware; Indexes; Low power electronics; Compression algorithms; Logarithmic quantization; convolutional neural network; low-power; network compression;

D O I：

10.1109/ACCESS.2024.3488093

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of deep learning and graphics processing units (GPUs), various convolutional neural network (CNN)-based computer vision studies have been conducted. Because numerous computations are involved in the inference and training process of CNNs, research on network compression, including quantization, is being actively conducted along with the use of CNNs. Unlike the existing linear quantization, logarithmic quantization has the advantage that the multiply-accumulate (MAC) operation in the convolution (CONV) operation, which occupies most of the CNNs, can be replaced with the addition operation and is suitable for low-precision quantization. In this paper, we propose a logarithmic quantization aware training technique that effectively reduces quantization loss while maximizing the effect of reducing hardware resources and power consumption in the forward and backward propagation processes of the CNN. The proposed method minimizes the accuracy drop by allocating the rounding point with the least quantization loss for each specific training in the forward pass and propagates the optimized gradient by scaling the gradient of parameters with a high quantization loss in the backward pass. As a result of scratch training on the Tiny-ImageNet dataset using ResNet-18, 34, and 50, where both weights and activations are quantized to 4-bits through the proposed method, an improvement in accuracy of 0.88%, 0.48%, and 1.72%, respectively, can be achieved compared to that of the baseline (i.e., full-precision). In addition, as a result of synthesizing the CONV acceleration unit of ResNet-18 through RTL implementation, the proposed 4-bit quantization can achieve a power saving of 82.3% compared to the baseline (i.e., full-precision) when computing ResNet-18.

引用

页码：159611 / 159621

页数：11

共 40 条

[1]

Banner R, 2019, ADV NEUR IN, V32

[2] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[3] An Efficient Implementation of Convolutional Neural Network With CLIP-Q Quantization on FPGA [J].

Cheng, Wei ;

Lin, Ing-Chao ;

Shih, Yun-Yang .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (10) :4093-4102

[4]

Chmiel B, 2024, Arxiv, DOI arXiv:2112.10769

[5] Hardware-Friendly Logarithmic Quantization with Mixed-Precision for MobileNetV2 [J].

Choi, Dahun ;

Kim, Hyun .

2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, :348-351

[6] Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving [J].

Choi, Jiwoong ;

Chun, Dayoung ;

Kim, Hyun ;

Lee, Hyuk-Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :502-511

[7]

Choi J, 2020, 2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), P16, DOI [10.1109/AICAS48895.2020.9073907, 10.1109/aicas48895.2020.9073907]

[8]

Choi J, 2018, Arxiv, DOI arXiv:1805.06085

[9] Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure [J].

Ding, Xiaohan ;

Ding, Guiguang ;

Guo, Yuchen ;

Han, Jungong .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4938-4948

[10] Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors [J].

Duy Thanh Nguyen ;

Kim, Hyun ;

Lee, Hyuk-Jae .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2450-2464

← 1 2 3 4 →