Low-Bit Quantization of Neural Network Based on Exponential Moving Average Knowledge Distillation

被引:0
作者
Lü J. [1 ,2 ]
Xu K. [1 ,2 ]
Wang D. [1 ,2 ]
机构
[1] Institute of Information Science, Beijing Jiaotong University, Beijing
[2] Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing
来源
Wang, Dong (wangdong@bjtu.edu.cn); Wang, Dong (wangdong@bjtu.edu.cn) | 1600年 / Science Press卷 / 34期
基金
北京市自然科学基金;
关键词
Deep Learning; Knowledge Distillation; Model Compression; Network Quantization;
D O I
10.16451/j.cnki.issn1003-6059.202112007
中图分类号
学科分类号
摘要
Now the memory and computational cost restrict the popularization of deep neural network application, whereas neural network quantization is an effective compression method. As the number of quantized bits is lower, the classification accuracy of neural networks becomes poorer in low-bit quantization of neural networks. To solve this problem, a low-bit quantization method of neural networks based on knowledge distillation is proposed. Firstly, a few images are exploited for adaptive initialization to train the quantization step of activation and weight to speed up the convergence of the quantization network. Then, the idea of exponential moving average knowledge distillation is introduced to normalize distillation loss and task loss and guide the training of quantization network. Experiments on ImageNet and CIFAR-10 datasets show that the performance of the proposed method is close to or better than that of the full precision network. © 2021, Science Press. All right reserved.
引用
收藏
页码:1143 / 1151
页数:8
相关论文
共 29 条
  • [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E., ImageNet Classification with Deep Convolutional Neural Networks, Proc of the 25th International Conference on Neural Information Processing Systems, pp. 1097-1105, (2012)
  • [2] HINTON G, DENG L, YU D, Et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, 29, 6, pp. 82-97, (2012)
  • [3] REDMON J, DIVVALA S, GIRSHICK R, Et al., You Only Look Once: Unified, Real-Time Object Detection, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, (2016)
  • [4] HAN S, POOL J, TRAN J, Et al., Learning Both Weights and Connections for Efficient Neural Networks, Proc of the 28th International Conference on Neural Information Processing Systems, I, pp. 1135-1143, (2015)
  • [5] WANG P S, HU Q H, ZHANG Y F, Et al., Two-Step Quantization for Low-Bit Neural Networks, Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4376-4384, (2018)
  • [6] HINTON G, VINYALS O, DEAN J., Distilling the Knowledge in a Neural Network
  • [7] JACOB B, KLIGYS S, CHEN B, Et al., Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2704-2713, (2018)
  • [8] CHOI J, WANG Z, VENKARANI S, Et al., PACT: Parameterized Clipping Activation for Quantized Neural Networks
  • [9] JUNG S, SON C, LEE S, Et al., Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss, Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4345-4354, (2019)
  • [10] ESSER S K, MCKINSTRY J L, BLANI D, Et al., Learned Step Size Quantization