Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

被引:0
作者
Li, Zhijian [1 ]
Yang, Biao [1 ]
Yin, Penghang [2 ]
Qi, Yingyong [1 ]
Xin, Jack [1 ]
机构
[1] Univ Calif Irvine, Dept Math, Irvine, CA 92617 USA
[2] SUNY Albany, Dept Math & Stat, Albany, NY USA
关键词
INDEX TERMS Model compression; quantization; knowledge distillation; image classification; convolu-tional neural networks; CONVEX;
D O I
10.1109/ACCESS.2023.3297890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN). The FA loss on intermediate feature maps of DNNs plays the role of teaching middle steps of a solution to a student instead of only giving final answers in the conventional KD where the loss acts on the network logits at the output level. Combining logit loss and FA loss, we found via convolutional network experiments on CIFAR-10/100, and Tiny ImageNet data sets that the quantized student network receives stronger supervision than from the labeled ground-truth data. The resulting FA quantization-distillation (FAQD), trained to convergence with a cosine annealing scheduler for 200 epochs, is capable of compressing models on label-free data up to or exceeding the accuracy levels of their full precision counterparts, which brings immediate practical benefits as pre-trained teacher models are readily available and unlabeled data are abundant. In contrast, data labeling is often laborious and expensive. Finally, we propose and prove error estimates for a fast feature affinity (FFA) loss function that accurately approximates FA loss at a lower order of computational complexity, which helps speed up training for high resolution image input. Source codes are available at: https://github.com/lzj994/FAQD
引用
收藏
页码:78042 / 78051
页数:10
相关论文
共 29 条
  • [1] [Anonymous], 2019, arXiv
  • [2] Oblivious Dimension Reduction for k-Means: Beyond Subspaces and the Johnson-Lindenstrauss Lemma
    Becchetti, Luca
    Bury, Marc
    Cohen-Addad, Vincent
    Grandoni, Fabrizio
    Schwiegelshohn, Chris
    [J]. PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1039 - 1050
  • [3] Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
  • [4] Deep Learning with Low Precision by Half-wave Gaussian Quantization
    Cai, Zhaowei
    He, Xiaodong
    Sun, Jian
    Vasconcelos, Nuno
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5406 - 5414
  • [5] Choi J, 2018, Arxiv, DOI arXiv:1805.06085
  • [6] Cobzas S, 2017, ADV OPER THEORY, V2, P21, DOI 10.22034/aot.1610.1022
  • [7] Courbariaux M, 2015, ADV NEUR IN, V28
  • [8] Dockhorn T., 2021, Advances in Neural Information Processing Systems, V34, P13202
  • [9] Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
    Gong, Ruihao
    Liu, Xianglong
    Jiang, Shenghu
    Li, Tianxiang
    Hu, Peng
    Lin, Jiazhen
    Yu, Fengwei
    Yan, Junjie
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4851 - 4860
  • [10] Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]