Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation

被引:137
作者
Ji, Mingi [1 ]
Shin, Seungjae [1 ]
Hwang, Seunghyun [2 ]
Park, Gibeom [1 ]
Moon, Il-Chul [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea
[2] Looko Inc, Los Angeles, CA USA
[3] Summary AI, Seoul, South Korea
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR46437.2021.01052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is a method of transferring the knowledge from a pretrained complex teacher model to a student model, so a smaller network can replace a large teacher network at the deployment stage. To reduce the necessity of training a large teacher model, the recent literatures introduced a self-knowledge distillation, which trains a student network progressively to distill its own knowledge without a pretrained teacher network. While Selfknowledge distillation is largely divided into a data augmentation based approach and an auxiliary network based approach, the data augmentation approach looses its local information in the augmentation process, which hinders its applicability to diverse vision tasks, such as semantic segmentation. Moreover, these knowledge distillation approaches do not receive the refined feature maps, which are prevalent in the object detection and semantic segmentation community. This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD), which utilizes an auxiliary self-teacher network to transfer a refined knowledge for the classifier network. Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. Therefore, FRSKD can be applied to classification, and semantic segmentation, which emphasize preserving the local information. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets. The implemented code is available at https://github.com/MingiJi/FRSKD.
引用
收藏
页码:10659 / 10668
页数:10
相关论文
共 42 条
[1]   Effect of Cu/CeO2 catalyst preparation methods on their characteristics for low temperature water - gas shift reaction: A detailed study [J].
Ahn, Seon-Yong ;
Na, Hyun-Suk ;
Jeon, Kyung-Won ;
Lee, Yeol-Lim ;
Kim, Kyoung-Jin ;
Shim, Jae-Oh ;
Roh, Hyun-Seog .
CATALYSIS TODAY, 2020, 352 :166-174
[2]  
[Anonymous], 2019, P IEEE CVF C COMP VI
[3]  
[Anonymous], The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
[4]  
Chung Inseop, 2020, PR MACH LEARN RES
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Furlanello T, 2018, PR MACH LEARN RES, V80
[7]  
He K., 2016, P IEEE C COMPUTER VI, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[8]   A Comprehensive Overhaul of Feature Distillation [J].
Heo, Byeongho ;
Kim, Jeesoo ;
Yun, Sangdoo ;
Park, Hyojin ;
Kwak, Nojun ;
Choi, Jin Young .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1921-1930
[9]  
Hinton Geoffrey, 2015, ARXIV
[10]   Learning Lightweight Lane Detection CNNs by Self Attention Distillation [J].
Hou, Yuenan ;
Ma, Zheng ;
Liu, Chunxiao ;
Loy, Chen Change .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1013-1021