Highlight Every Step: Knowledge Distillation via Collaborative Teaching

被引:66
作者
Zhao, Haoran [1 ]
Sun, Xin [1 ]
Dong, Junyu [1 ]
Chen, Changrui [1 ]
Dong, Zihe [1 ]
机构
[1] Ocean Univ China, Dept Comp Sci & Technol, Qingdao 266100, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Knowledge engineering; Neural networks; Collaboration; Computational modeling; Task analysis; Computer vision; deep learning; knowledge distillation (KD); neural-networks compression; ANOMALY DETECTION; CLASSIFICATION; NETWORK;
D O I
10.1109/TCYB.2020.3007506
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High storage and computational costs obstruct deep neural networks to be deployed on resource-constrained devices. Knowledge distillation (KD) aims to train a compact student network by transferring knowledge from a larger pretrained teacher model. However, most existing methods on KD ignore the valuable information among the training process associated with training results. In this article, we provide a new collaborative teaching KD (CTKD) strategy which employs two special teachers. Specifically, one teacher trained from scratch (i.e., scratch teacher) assists the student step by step using its temporary outputs. It forces the student to approach the optimal path toward the final logits with high accuracy. The other pretrained teacher (i.e., expert teacher) guides the student to focus on a critical region that is more useful for the task. The combination of the knowledge from two special teachers can significantly improve the performance of the student network in KD. The results of experiments on CIFAR-10, CIFAR-100, SVHN, Tiny ImageNet, and ImageNet datasets verify that the proposed KD method is efficient and achieves state-of-the-art performance.
引用
收藏
页码:2070 / 2081
页数:12
相关论文
共 84 条
[1]  
[Anonymous], 2009, CIFAR-100 Dataset
[2]  
[Anonymous], 1992, Adv. Neural Inf. Process. Syst.
[3]  
[Anonymous], 2006, P ACM SIGKDD INT C K
[4]  
[Anonymous], 1989, NeurIPS
[5]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[6]  
Ba LJ, 2014, ADV NEUR IN, V27
[7]  
Baker B., 2016, P 5 INT C LEARN REPR
[8]  
Bello Irwan, 2017, P 34 INT C MACH LEAR, V70, P459
[9]  
Chen CR, 2020, AAAI CONF ARTIF INTE, V34, P10510
[10]  
Chen WL, 2015, PR MACH LEARN RES, V37, P2285