Multi-scale Feature Extraction and Fusion for Online Knowledge Distillation

被引:7
作者
Zou, Panpan [1 ]
Teng, Yinglei [1 ,2 ]
Niu, Tao [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Beijing Key Lab Space Ground Interconnect & Conve, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV | 2022年 / 13532卷
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Multi-scale; Feature fusion;
D O I
10.1007/978-3-031-15937-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multi-scale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIFAR-10, CIFAR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures.
引用
收藏
页码:126 / 138
页数:13
相关论文
共 26 条
  • [1] Bucilua Cristian, 2006, P 12 ACM SIGKDD INT, P535
  • [2] Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
  • [3] Darlow L.N., 2018, ARXIV
  • [4] Duta I.C., 2020, arXiv
  • [5] Res2Net: A New Multi-Scale Backbone Architecture
    Gao, Shang-Hua
    Cheng, Ming-Ming
    Zhao, Kai
    Zhang, Xin-Yu
    Yang, Ming-Hsuan
    Torr, Philip
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 652 - 662
  • [6] Guo QS, 2020, PROC CVPR IEEE, P11017, DOI 10.1109/CVPR42600.2020.01103
  • [7] Heo B, 2019, AAAI CONF ARTIF INTE, P3779
  • [8] Hinton Geoffrey, 2015, DISTILLING KNOWLEDGE
  • [9] Feature Fusion for Online Mutual Knowledge Distillation
    Kim, Jangho
    Hyun, Minsung
    Chung, Inseop
    Kwak, Nojun
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4619 - 4625
  • [10] KRIZHEVSKY A., 2009, Learning multiple layers of features from tiny images