Low-Bit Quantization of Neural Network Based on Exponential Moving Average Knowledge Distillation

被引:0
作者
Lü J. [1 ,2 ]
Xu K. [1 ,2 ]
Wang D. [1 ,2 ]
机构
[1] Institute of Information Science, Beijing Jiaotong University, Beijing
[2] Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing
来源
Wang, Dong (wangdong@bjtu.edu.cn); Wang, Dong (wangdong@bjtu.edu.cn) | 1600年 / Science Press卷 / 34期
基金
北京市自然科学基金;
关键词
Deep Learning; Knowledge Distillation; Model Compression; Network Quantization;
D O I
10.16451/j.cnki.issn1003-6059.202112007
中图分类号
学科分类号
摘要
Now the memory and computational cost restrict the popularization of deep neural network application, whereas neural network quantization is an effective compression method. As the number of quantized bits is lower, the classification accuracy of neural networks becomes poorer in low-bit quantization of neural networks. To solve this problem, a low-bit quantization method of neural networks based on knowledge distillation is proposed. Firstly, a few images are exploited for adaptive initialization to train the quantization step of activation and weight to speed up the convergence of the quantization network. Then, the idea of exponential moving average knowledge distillation is introduced to normalize distillation loss and task loss and guide the training of quantization network. Experiments on ImageNet and CIFAR-10 datasets show that the performance of the proposed method is close to or better than that of the full precision network. © 2021, Science Press. All right reserved.
引用
收藏
页码:1143 / 1151
页数:8
相关论文
共 29 条
[11]  
JAIN S R, GURAL A, WU M, Et al., Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
[12]  
ZHOU A J, YAO A B, GUO Y W, Et al., Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
[13]  
ZHANG D Q, YANG J L, YE D Q Z, Et al., LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks, Proc of the European Conference on Computer Vision, pp. 373-390, (2018)
[14]  
GAO M Y, SHEN Y J, LI Q Q, Et al., Residual Knowledge Distillation
[15]  
NOWAK T S, CORSO J J., Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression
[16]  
POLINO A, PASCANU R, ALISTARH D., Model Compression via Distillation and Quantization
[17]  
WEI Y, PAN X Y, QIN H W, Et al., Quantization Mimic: Towards Very Tiny CNN for Object Detection, Proc of the European Conference on Computer Vision, pp. 274-290, (2018)
[18]  
MISHRA A, MARR D., Apprentice: Using Knowledge Distillation Techniques to Improve Low-Precision Network Accuracy
[19]  
BENGIO Y, LEONARD N, COURVILLE A., Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation
[20]  
KRIZHEVSKY A., Learning Multiple Layers of Features from Tiny Images