A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks Via Learned Weights Statistics

被引:5
作者
Sharma, Prasen Kumar [1 ,2 ,3 ]
Abraham, Arun [1 ]
Rajendiran, Vikram Nelvoy [1 ]
机构
[1] Samsung R&D Inst India Bangalore, Bangalore 560037, India
[2] Indian Inst Technol Guwahati, Gauhati 781039, India
[3] TensorTour Inc, Newark, NJ 94560 USA
关键词
Quantization (signal); Training; Data models; Tensors; Calibration; Computational modeling; Convolutional neural networks; Data distillation; deep convolutional neural networks (CNNs); model compression; post-training quantization;
D O I
10.1109/TMM.2021.3134158
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Quantizating the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not require original unlabelled training samples of a given task. These best-published works heavily rely on the learned batch normalization (BN) parameters to infer the range of the activations for quantization. In particular, these methods are built upon either empirical estimation framework or the data distillation approach, for computing the range of the activations. However, the performance of such schemes severely degrades when presented with a network that does not accommodate BN layers. In this line of thought, we propose a generalized zero-shot quantization (GZSQ) framework that neither requires original data nor relies on BN layer statistics. We have utilized the data distillation approach and leveraged only the pre-trained weights of the model to estimate enriched data for range calibration of the activations. To the best of our knowledge, this is the first work that utilizes the distribution of the pre-trained weights to assist the process of zero-shot quantization. The proposed scheme has significantly outperformed the existing zero-shot works, e.g., an improvement of similar to 33% in classification accuracy for MobileNetV2 and several other models that are w & w/o BN layers, for a variety of tasks. We have also demonstrated the efficacy of the proposed work across multiple open-source quantization frameworks. Importantly, our work is the first attempt towards the post-training zero-shot quantization of futuristic unnormalized deep neural networks.
引用
收藏
页码:953 / 965
页数:13
相关论文
共 87 条
  • [1] The proximal origin of SARS-CoV-2
    Andersen, Kristian G.
    Rambaut, Andrew
    Lipkin, W. Ian
    Holmes, Edward C.
    Garry, Robert F.
    [J]. NATURE MEDICINE, 2020, 26 (04) : 450 - 452
  • [2] Banner R, 2019, ADV NEUR IN, V32
  • [3] Baskin C., 2021, J. Mach. Learn. Res, V22, P1
  • [4] Bengio Y., 2010, P 13 INT C ART INT S, P249, DOI DOI 10.4236/JSIP.2015.62006
  • [5] Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
    Brattoli, Biagio
    Tighe, Joseph
    Zhdanov, Fedor
    Perona, Pietro
    Chalupka, Krzysztof
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4612 - 4622
  • [6] Brock Andrew, 2021, ICLR
  • [7] Cai YH, 2020, PROC CVPR IEEE, P13166, DOI 10.1109/CVPR42600.2020.01318
  • [8] SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization
    Cao, Shijie
    Ma, Lingxiao
    Xiao, Wencong
    Zhang, Chen
    Liu, Yunxin
    Zhang, Lintao
    Nie, Lanshun
    Yang, Zhi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11208 - 11217
  • [9] Chen TQ, 2015, Arxiv, DOI arXiv:1512.01274
  • [10] An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections
    Cheng, Yu
    Yu, Felix X.
    Feris, Rogerio S.
    Kumar, Sanjiv
    Choudhary, Alok
    Chang, Shih-Fu
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2857 - 2865