Teacher-free Distillation via Regularizing Intermediate Representation

被引:4
作者
Li, Lujun [1 ]
Liang, Shiuan-Ni [1 ]
Yang, Ya [1 ]
Jin, Zhe [2 ]
机构
[1] Monash Univ, Sch Engn, Subang Jaya, Malaysia
[2] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Knowledge distillation;
D O I
10.1109/IJCNN55064.2022.9892575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature distillation always leads to significant performance improvements, but requires extra training budgets. To address the problem, we propose TFD, a simple and effective Teacher- Free Distillation framework, which seeks to reuse the privileged features within the student network itself. Specifically, TFD squeezes feature knowledge in the deeper layers into the shallow ones by minimizing feature loss. Thanks to the narrow gap of these self- features, TFD only needs to adopt a simple la loss without complex transformations. Extensive experiments on recognition benchmarks show that our framework can achieve superior performance than teacher- based feature distillation methods. On the lmageNet dataset, our approach achieves 0.8% gains for ResNet18, which surpasses other state-of-the-art training techniques.
引用
收藏
页数:6
相关论文
共 50 条
[1]  
[Anonymous], 2020, ICML
[2]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00938
[3]  
Brown TB, 2020, ADV NEUR IN, V33
[4]  
Bucilua C., 2006, P ACM INT C KNOWLEDG, P535
[5]  
Chattopadhyay A., 2018, WACV
[6]  
Chen Defang, 2020, ARXIV201203236
[7]  
Cheng Xu, 2020, CVPR
[8]  
Cho Jang Hyun, 2019, ICCV
[9]  
Chung Inseop, 2020, ICML
[10]   AutoAugment: Learning Augmentation Strategies from Data [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Mane, Dandelion ;
Vasudevan, Vijay ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :113-123