Post Training Mixed Precision Quantization of Neural Networks using First-Order Information

被引:2
|
作者
Chauhan, Arun [1 ]
Tiwari, Utsav [1 ]
Vikram, N. R. [1 ]
机构
[1] Samsung Res Inst, Bangalore, India
关键词
D O I
10.1109/ICCVW60793.2023.00144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantization is an efficient way of downsizing both memory footprints and inference time of large size Deep Neural Networks (DNNs) and makes their application feasible on resource-constrained devices. However, quantizing all layers uniformly with ultra-low precision bits results in significant degradation in performance. A promising approach to address this problem is mixed-precision quantization where higher bit precisions are assigned to layers that are more sensitive. In this study, we introduce the method that uses first-order information (i.e. gradient) only for determining the neural network layers' sensitivity for mixed-precision quantization and shows that the proposed method is equally effective in performance and better in computation complexity with its counterpart methods which use second order information (i.e. hessian). Finally, we formulate the mixed precision problem as an Integer linear programming problem which uses proposed sensitivity metric and allocate the number of bits for each layer efficiently for a given model size. Furthermore, we only use post training quantization techniques to achieve the state of the art results in comparison to the popular methods for mixed precision quantization which fine-tunes the model with large training data. Extensive experiments conducted on benchmark vision neural network architectures using ImageNet dataset demonstrates the superiority over existing mixed-precision approaches. Our proposed method achieves better or comparable results for ResNet18 (0.65% accuracy-drop, for 8x weight compression), ResNet50 (0.69% accuracy-drop, for 8x weight compression), MobileNet-V2 (0.49% accuracy-drop, for 8x weight compression) and Inception-V3 (1.30% accuracy-drop, for 8x weight compression), compared to other state-of-the-art methods which requires retraining or uses hessian as a sensitivity metric for mixed precision quantization.
引用
收藏
页码:1335 / 1344
页数:10
相关论文
共 50 条
  • [41] Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks
    Tai, Yu-Shan
    Chang, Cheng-Yang
    Teng, Chieh-Fang
    Chen, Yi-Ta
    Wu, An-Yeu
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4025 - 4037
  • [42] Training of Mixed-Signal Optical Convolutional Neural Networks With Reduced Quantization Levels
    Zhu, Zheyuan
    Ulseth, Joseph
    Li, Guifang
    Pang, Shuo
    IEEE ACCESS, 2021, 9 : 56645 - 56652
  • [43] FIRST-ORDER RECURRENT NEURAL NETWORKS AND DETERMINISTIC FINITE-STATE AUTOMATA
    MANOLIOS, P
    NEURAL COMPUTATION, 1994, 6 (06) : 1155 - 1173
  • [44] Expressive power of first-order recurrent neural networks determined by their attractor dynamics
    Cabessa, Jeremie
    Villa, Alessandro E. P.
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2016, 82 (08) : 1232 - 1250
  • [45] Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment
    Lee, Jemin
    Yu, Misun
    Kwon, Yongin
    Kim, Taeho
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 132 : 124 - 135
  • [46] First-order 2D cellular neural networks investigation and learning
    Pudov, S
    PARALLEL COMPUTING TECHNOLOGIES, 2001, 2127 : 94 - 97
  • [47] Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks
    Doerrich, Marion
    Fan, Mingcheng
    Kist, Andreas M.
    IEEE ACCESS, 2023, 11 : 57627 - 57634
  • [48] POSITNN: TRAINING DEEP NEURAL NETWORKS WITH MIXED LOW-PRECISION POSIT
    Raposo, Goncalo
    Tomas, Pedro
    Roma, Nuno
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7908 - 7912
  • [49] Structured Dynamic Precision for Deep Neural Networks Quantization
    Huang, Kai
    Li, Bowen
    Xiong, Dongliang
    Jiang, Haitian
    Jiang, Xiaowen
    Yan, Xiaolang
    Claesen, Luc
    Liu, Dehong
    Chen, Junjian
    Liu, Zhili
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (01)
  • [50] A novel evolutionary clustering via the first-order varying information for dynamic networks
    Yu, Wei
    Jiao, Pengfei
    Wang, Wenjun
    Yu, Yang
    Chen, Xue
    Pan, Lin
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 520 : 507 - 520