Teacher-free Distillation via Regularizing Intermediate Representation

被引：4

作者：

Li, Lujun ^{[1
]}

Liang, Shiuan-Ni ^{[1
]}

Yang, Ya ^{[1
]}

Jin, Zhe ^{[2
]}

机构：

[1] Monash Univ, Sch Engn, Subang Jaya, Malaysia

[2] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China

来源：

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年

关键词：

Knowledge distillation;

D O I：

10.1109/IJCNN55064.2022.9892575

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature distillation always leads to significant performance improvements, but requires extra training budgets. To address the problem, we propose TFD, a simple and effective Teacher- Free Distillation framework, which seeks to reuse the privileged features within the student network itself. Specifically, TFD squeezes feature knowledge in the deeper layers into the shallow ones by minimizing feature loss. Thanks to the narrow gap of these self- features, TFD only needs to adopt a simple la loss without complex transformations. Extensive experiments on recognition benchmarks show that our framework can achieve superior performance than teacher- based feature distillation methods. On the lmageNet dataset, our approach achieves 0.8% gains for ResNet18, which surpasses other state-of-the-art training techniques.

引用

页数：6

共 50 条

[1]

[Anonymous], 2020, ICML

[2]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00938

[3]

Brown TB, 2020, ADV NEUR IN, V33

[4]

Bucilua C., 2006, P ACM INT C KNOWLEDG, P535

[5]

Chattopadhyay A., 2018, WACV

[6]

Chen Defang, 2020, ARXIV201203236

[7]

Cheng Xu, 2020, CVPR

[8]

Cho Jang Hyun, 2019, ICCV

[9]

Chung Inseop, 2020, ICML

[10] AutoAugment: Learning Augmentation Strategies from Data [J].

Cubuk, Ekin D. ;

Zoph, Barret ;

Mane, Dandelion ;

Vasudevan, Vijay ;

Le, Quoc V. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :113-123

← 1 2 3 4 5 →