Channel-wise Knowledge Distillation for Dense Prediction

被引:256
作者
Shu, Changyong [1 ,4 ]
Liu, Yifan [2 ]
Gao, Jianfei [1 ]
Yan, Zheng [1 ]
Shen, Chunhua [3 ]
机构
[1] Shanghai Data Technol Co, Shanghai, Peoples R China
[2] Univ Adelaide, Adelaide, SA, Australia
[3] Monash Univ, Clayton, Vic, Australia
[4] Baidu Inc, Beijing, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has been proven a simple and effective tool for training compact dense prediction models. Lightweight student networks are trained by extra supervision transferred from large teacher networks. Most previous KD variants for dense prediction tasks align the activation maps from the student and teacher network in the spatial domain, typically by normalizing the activation values on each spatial location and minimizing point-wise and/or pair-wise discrepancy. Different from the previous methods, here we propose to normalize the activation map of each channel to obtain a soft probability map. By simply minimizing the Kullback-Leibler (KL) divergence between the channel-wise probability map of the two networks, the distillation process pays more attention to the most salient regions of each channel, which are valuable for dense prediction tasks. We conduct experiments on a few dense prediction tasks, including semantic segmentation and object detection. Experiments demonstrate that our proposed method outperforms state-of-the-art distillation methods considerably, and can require less computational cost during training. In particular, we improve the RetinaNet detector (ResNet50 backbone) by 3.4% in mAP on the COCO dataset, and PSPNet (ResNet18 backbone) by 5.81% in mIoU on the Cityscapes dataset. Code is available at: https://git.io/Distiller
引用
收藏
页码:5291 / 5300
页数:10
相关论文
共 47 条
[1]  
Adam Paszke, 2016, IEEE C COMP VIS PATT
[2]  
Adriana Romero, 2015, INT C LEARN REPR, P1
[3]  
Bhattacharyya A., 1943, Bulletin of the Calcutta Mathematical Society, V35, P99, DOI DOI 10.1007/s00426-017-0947-6
[4]  
Chen GB, 2017, ADV NEUR IN, V30
[5]  
Chen L.-C., 2017, P IEEE C COMP VIS PA
[6]  
Chen Wenhu, 2020, INT C LEARN REPR
[7]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[8]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[9]  
Fu Jie, 2020, INT C LEARN REPR
[10]  
Guan Yushuo, 2020, EUR C COMP VIS