Multi-scale Feature Extraction and Fusion for Online Knowledge Distillation

被引:9
作者
Zou, Panpan [1 ]
Teng, Yinglei [1 ,2 ]
Niu, Tao [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Beijing Key Lab Space Ground Interconnect & Conve, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV | 2022年 / 13532卷
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Multi-scale; Feature fusion;
D O I
10.1007/978-3-031-15937-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multi-scale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIFAR-10, CIFAR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures.
引用
收藏
页码:126 / 138
页数:13
相关论文
共 26 条
[1]  
[Anonymous], 2009, CIFAR-100 Dataset
[2]  
Bucilua C., 2006, P ACM INT C KNOWLEDG, P535
[3]  
Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
[4]  
Darlow Luke N, 2018, arXiv
[5]  
Duta I. C., 2020, ARXIV
[6]   Res2Net: A New Multi-Scale Backbone Architecture [J].
Gao, Shang-Hua ;
Cheng, Ming-Ming ;
Zhao, Kai ;
Zhang, Xin-Yu ;
Yang, Ming-Hsuan ;
Torr, Philip .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662
[7]   Online Knowledge Distillation via Collaborative Learning [J].
Guo, Qiushan ;
Wang, Xinjiang ;
Wu, Yichao ;
Yu, Zhipeng ;
Liang, Ding ;
Hu, Xiaolin ;
Luo, Ping .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11017-11026
[8]  
Heo B, 2019, AAAI CONF ARTIF INTE, P3779
[9]  
Hinton G., 2014, Distilling the knowledge in a neural network
[10]   Feature Fusion for Online Mutual Knowledge Distillation [J].
Kim, Jangho ;
Hyun, Minsung ;
Chung, Inseop ;
Kwak, Nojun .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :4619-4625