Improving Knowledge Distillation With a Customized Teacher

被引:12
作者
Tan, Chao [1 ,2 ]
Liu, Jie [1 ,2 ]
机构
[1] Natl Univ Def Technol, Sci & Parallel & Distributed Proc Lab, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Lab Software Engn Complex Syst, Changsha 410073, Peoples R China
关键词
Knowledge distillation (KD); knowledge transfer; neural network acceleration; neural network compression;
D O I
10.1109/TNNLS.2022.3189680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) is a widely used approach to transfer knowledge from a cumbersome network (also known as a teacher) to a lightweight network (also known as a student). However, even though the accuracies of different teachers are similar, the fixed student's accuracies are significantly different. We find that teachers with more dispersed secondary soft probabilities are more qualified to play their roles. Therefore, an indicator, i.e., the standard deviation sigma of secondary soft probabilities, is introduced to choose the teacher. Moreover, to make a teacher's secondary soft probabilities more dispersed, a novel method, dubbed pretraining the teacher under dual supervision (PTDS), is proposed to pretrain a teacher under dual supervision. In addition, we put forward an asymmetrical transformation function (ATF) to further enhance the dispersion degree of the pretrained teachers' secondary soft probabilities. The combination of PTDS and ATF is termed knowledge distillation with a customized teacher (KDCT). Extensive empirical experiments and analyses are conducted on three computer vision tasks, including image classification, transfer learning, and semantic segmentation, to substantiate the effectiveness of KDCT.
引用
收藏
页码:2290 / 2299
页数:10
相关论文
共 40 条
[1]  
Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
[2]  
Chen DF, 2021, AAAI CONF ARTIF INTE, V35, P7028
[3]  
Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]  
Chen WL, 2015, PR MACH LEARN RES, V37, P2285
[6]   On the Efficacy of Knowledge Distillation [J].
Cho, Jang Hyun ;
Hariharan, Bharath .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4793-4801
[7]  
Darlow L N, 2018, arXiv
[8]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[9]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[10]  
Han S, 2016, Arxiv, DOI arXiv:1510.00149