Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

被引：0

作者：

Li, Zhijian ^{[1
]}

Yang, Biao ^{[1
]}

Yin, Penghang ^{[2
]}

Qi, Yingyong ^{[1
]}

Xin, Jack ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Math, Irvine, CA 92617 USA

[2] SUNY Albany, Dept Math & Stat, Albany, NY USA

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

INDEX TERMS Model compression; quantization; knowledge distillation; image classification; convolu-tional neural networks; CONVEX;

D O I：

10.1109/ACCESS.2023.3297890

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN). The FA loss on intermediate feature maps of DNNs plays the role of teaching middle steps of a solution to a student instead of only giving final answers in the conventional KD where the loss acts on the network logits at the output level. Combining logit loss and FA loss, we found via convolutional network experiments on CIFAR-10/100, and Tiny ImageNet data sets that the quantized student network receives stronger supervision than from the labeled ground-truth data. The resulting FA quantization-distillation (FAQD), trained to convergence with a cosine annealing scheduler for 200 epochs, is capable of compressing models on label-free data up to or exceeding the accuracy levels of their full precision counterparts, which brings immediate practical benefits as pre-trained teacher models are readily available and unlabeled data are abundant. In contrast, data labeling is often laborious and expensive. Finally, we propose and prove error estimates for a fast feature affinity (FFA) loss function that accurately approximates FA loss at a lower order of computational complexity, which helps speed up training for high resolution image input. Source codes are available at: https://github.com/lzj994/FAQD

引用

页码：78042 / 78051

页数：10

共 29 条

[1] [Anonymous], 2019, arXiv
[2] Oblivious Dimension Reduction for k-Means: Beyond Subspaces and the Johnson-Lindenstrauss Lemma
Becchetti, Luca
Bury, Marc
Cohen-Addad, Vincent
Grandoni, Fabrizio
Schwiegelshohn, Chris
[J]. PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1039 - 1050
[3] Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[4] Deep Learning with Low Precision by Half-wave Gaussian Quantization
Cai, Zhaowei
He, Xiaodong
Sun, Jian
Vasconcelos, Nuno
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5406 - 5414
[5] Choi J, 2018, Arxiv, DOI arXiv:1805.06085
[6] Cobzas S, 2017, ADV OPER THEORY, V2, P21, DOI 10.22034/aot.1610.1022
[7] Courbariaux M, 2015, ADV NEUR IN, V28
[8] Dockhorn T., 2021, Advances in Neural Information Processing Systems, V34, P13202
[9] Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Gong, Ruihao
Liu, Xianglong
Jiang, Shenghu
Li, Tianxiang
Hu, Peng
Lin, Jiazhen
Yu, Fengwei
Yan, Junjie
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4851 - 4860
[10] Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]

← 1 2 3 →