Post Training Mixed Precision Quantization of Neural Networks using First-Order Information

被引:2
|
作者
Chauhan, Arun [1 ]
Tiwari, Utsav [1 ]
Vikram, N. R. [1 ]
机构
[1] Samsung Res Inst, Bangalore, India
关键词
D O I
10.1109/ICCVW60793.2023.00144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantization is an efficient way of downsizing both memory footprints and inference time of large size Deep Neural Networks (DNNs) and makes their application feasible on resource-constrained devices. However, quantizing all layers uniformly with ultra-low precision bits results in significant degradation in performance. A promising approach to address this problem is mixed-precision quantization where higher bit precisions are assigned to layers that are more sensitive. In this study, we introduce the method that uses first-order information (i.e. gradient) only for determining the neural network layers' sensitivity for mixed-precision quantization and shows that the proposed method is equally effective in performance and better in computation complexity with its counterpart methods which use second order information (i.e. hessian). Finally, we formulate the mixed precision problem as an Integer linear programming problem which uses proposed sensitivity metric and allocate the number of bits for each layer efficiently for a given model size. Furthermore, we only use post training quantization techniques to achieve the state of the art results in comparison to the popular methods for mixed precision quantization which fine-tunes the model with large training data. Extensive experiments conducted on benchmark vision neural network architectures using ImageNet dataset demonstrates the superiority over existing mixed-precision approaches. Our proposed method achieves better or comparable results for ResNet18 (0.65% accuracy-drop, for 8x weight compression), ResNet50 (0.69% accuracy-drop, for 8x weight compression), MobileNet-V2 (0.49% accuracy-drop, for 8x weight compression) and Inception-V3 (1.30% accuracy-drop, for 8x weight compression), compared to other state-of-the-art methods which requires retraining or uses hessian as a sensitivity metric for mixed precision quantization.
引用
收藏
页码:1335 / 1344
页数:10
相关论文
共 50 条
  • [1] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [2] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [3] First-order logical neural networks
    Lerdlamnaochai, T
    Kijsirikul, B
    HIS'04: Fourth International Conference on Hybrid Intelligent Systems, Proceedings, 2005, : 192 - 197
  • [4] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [5] Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision
    Liu, Xinghcao
    Ye, Mao
    Zhou, Dengyong
    Liu, Qiang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8697 - 8705
  • [6] Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks
    Wei, Hang
    Wang, Zulin
    Ni, Yuanhan
    REMOTE SENSING, 2024, 16 (21)
  • [7] Hessian-based mixed-precision quantization with transition aware training for neural networks
    Huang, Zhiyong
    Han, Xiao
    Yu, Zhi
    Zhao, Yunlan
    Hou, Mingyang
    Hu, Shengdong
    NEURAL NETWORKS, 2025, 182
  • [8] Augmenting Neural Networks with First-order Logic
    Li, Tao
    Srikumar, Vivek
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 292 - 302
  • [9] Mixed Precision Weight Networks: Training Neural Networks with Varied Precision Weights
    Fuengfusin, Ninnart
    Tamukoh, Hakaru
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 614 - 623
  • [10] The BRST quantization of first-order systems
    Bizdadea, C
    Saliu, SO
    HELVETICA PHYSICA ACTA, 1997, 70 (04): : 590 - 597