Post Training Mixed Precision Quantization of Neural Networks using First-Order Information

被引：2

作者：

Chauhan, Arun ^{[1
]}

Tiwari, Utsav ^{[1
]}

Vikram, N. R. ^{[1
]}

机构：

[1] Samsung Res Inst, Bangalore, India

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00144

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Quantization is an efficient way of downsizing both memory footprints and inference time of large size Deep Neural Networks (DNNs) and makes their application feasible on resource-constrained devices. However, quantizing all layers uniformly with ultra-low precision bits results in significant degradation in performance. A promising approach to address this problem is mixed-precision quantization where higher bit precisions are assigned to layers that are more sensitive. In this study, we introduce the method that uses first-order information (i.e. gradient) only for determining the neural network layers' sensitivity for mixed-precision quantization and shows that the proposed method is equally effective in performance and better in computation complexity with its counterpart methods which use second order information (i.e. hessian). Finally, we formulate the mixed precision problem as an Integer linear programming problem which uses proposed sensitivity metric and allocate the number of bits for each layer efficiently for a given model size. Furthermore, we only use post training quantization techniques to achieve the state of the art results in comparison to the popular methods for mixed precision quantization which fine-tunes the model with large training data. Extensive experiments conducted on benchmark vision neural network architectures using ImageNet dataset demonstrates the superiority over existing mixed-precision approaches. Our proposed method achieves better or comparable results for ResNet18 (0.65% accuracy-drop, for 8x weight compression), ResNet50 (0.69% accuracy-drop, for 8x weight compression), MobileNet-V2 (0.49% accuracy-drop, for 8x weight compression) and Inception-V3 (1.30% accuracy-drop, for 8x weight compression), compared to other state-of-the-art methods which requires retraining or uses hessian as a sensitivity metric for mixed precision quantization.

引用

页码：1335 / 1344

页数：10

共 50 条

[1] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[2] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[3] First-order logical neural networks
Lerdlamnaochai, T
Kijsirikul, B
HIS'04: Fourth International Conference on Hybrid Intelligent Systems, Proceedings, 2005, : 192 - 197
[4] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[5] Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision
Liu, Xinghcao
Ye, Mao
Zhou, Dengyong
Liu, Qiang
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8697 - 8705
[6] Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks
Wei, Hang
Wang, Zulin
Ni, Yuanhan
REMOTE SENSING, 2024, 16 (21)
[7] Hessian-based mixed-precision quantization with transition aware training for neural networks
Huang, Zhiyong
Han, Xiao
Yu, Zhi
Zhao, Yunlan
Hou, Mingyang
Hu, Shengdong
NEURAL NETWORKS, 2025, 182
[8] Augmenting Neural Networks with First-order Logic
Li, Tao
Srikumar, Vivek
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 292 - 302
[9] Mixed Precision Weight Networks: Training Neural Networks with Varied Precision Weights
Fuengfusin, Ninnart
Tamukoh, Hakaru
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 614 - 623
[10] The BRST quantization of first-order systems
Bizdadea, C
Saliu, SO
HELVETICA PHYSICA ACTA, 1997, 70 (04): : 590 - 597

← 1 2 3 4 5 →