Post Training Mixed Precision Quantization of Neural Networks using First-Order Information

被引:2
|
作者
Chauhan, Arun [1 ]
Tiwari, Utsav [1 ]
Vikram, N. R. [1 ]
机构
[1] Samsung Res Inst, Bangalore, India
关键词
D O I
10.1109/ICCVW60793.2023.00144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantization is an efficient way of downsizing both memory footprints and inference time of large size Deep Neural Networks (DNNs) and makes their application feasible on resource-constrained devices. However, quantizing all layers uniformly with ultra-low precision bits results in significant degradation in performance. A promising approach to address this problem is mixed-precision quantization where higher bit precisions are assigned to layers that are more sensitive. In this study, we introduce the method that uses first-order information (i.e. gradient) only for determining the neural network layers' sensitivity for mixed-precision quantization and shows that the proposed method is equally effective in performance and better in computation complexity with its counterpart methods which use second order information (i.e. hessian). Finally, we formulate the mixed precision problem as an Integer linear programming problem which uses proposed sensitivity metric and allocate the number of bits for each layer efficiently for a given model size. Furthermore, we only use post training quantization techniques to achieve the state of the art results in comparison to the popular methods for mixed precision quantization which fine-tunes the model with large training data. Extensive experiments conducted on benchmark vision neural network architectures using ImageNet dataset demonstrates the superiority over existing mixed-precision approaches. Our proposed method achieves better or comparable results for ResNet18 (0.65% accuracy-drop, for 8x weight compression), ResNet50 (0.69% accuracy-drop, for 8x weight compression), MobileNet-V2 (0.49% accuracy-drop, for 8x weight compression) and Inception-V3 (1.30% accuracy-drop, for 8x weight compression), compared to other state-of-the-art methods which requires retraining or uses hessian as a sensitivity metric for mixed precision quantization.
引用
收藏
页码:1335 / 1344
页数:10
相关论文
共 50 条
  • [31] WKB energy quantization and first-order perturbation theory
    Robinett, RW
    AMERICAN JOURNAL OF PHYSICS, 1997, 65 (04) : 320 - 328
  • [32] On channel estimation using superimposed training and first-order statistics
    Tugnait, JK
    Luo, WL
    IEEE COMMUNICATIONS LETTERS, 2003, 7 (09) : 413 - 415
  • [33] Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks
    Latotzke, Cecilia
    Balim, Batuhan
    Gemmeke, Tobias
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1559 - 1566
  • [34] PTMQ: Post-training Multi-Bit Quantization of Neural Networks
    Xu, Ke
    Li, Zhongcheng
    Wang, Shanshan
    Zhang, Xingyi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16193 - 16201
  • [35] On channel estimation using superimposed training and first-order statistics
    Tugnait, JK
    Luo, W
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: SIGNAL PROCESSING FOR COMMUNICATIONS SPECIAL SESSIONS, 2003, : 624 - 627
  • [36] Effective Post-Training Quantization Of Neural Networks For Inference on Low Power Neural Accelerator
    Demidovskij, Alexander
    Smirnov, Eugene
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [37] Modeling Personal Identifiable Information Using First-Order Logic
    Shatnawi, Amani
    Clyde, Stephen
    2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,
  • [38] Using first-order information in direct multisearch for multiobjective optimization
    Andreani, R.
    Custodio, A. L.
    Raydan, M.
    OPTIMIZATION METHODS & SOFTWARE, 2022, 37 (06): : 2135 - 2156
  • [39] First-order patterns for information integration
    Cameron, MA
    Taylor, K
    WEB ENGINEERING, PROCEEDINGS, 2005, 3579 : 173 - 184
  • [40] Learning first-order Bayesian networks
    Chatpatanasiri, R
    Kijsirikul, B
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 313 - 328