Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:59
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [21] Attention-Based Real Image Restoration
    Anwar, Saeed
    Barnes, Nick
    Petersson, Lars
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021,
  • [22] Band Selection of Hyperspectral Images Using Attention-Based Autoencoders
    Dou, Zeyang
    Gao, Kun
    Zhang, Xiaodian
    Wang, Hong
    Han, Lu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (01) : 147 - 151
  • [23] AESGRU: An Attention-Based Temporal Correlation Approach for End-to-End Machine Health Perception
    Zhang, Weiting
    Yang, Dong
    Wang, Hongchao
    Zhang, Jun
    Gidlund, Mikael
    IEEE ACCESS, 2019, 7 : 141487 - 141497
  • [24] Knowledge Distillation With Feature Self Attention
    Park, Sin-Gu
    Kang, Dong-Joong
    IEEE ACCESS, 2023, 11 : 34554 - 34562
  • [25] Gigapixel Histopathological Image Analysis Using Attention-Based Neural Networks
    Brancati, Nadia
    De Pietro, Giuseppe
    Riccio, Daniel
    Frucci, Maria
    IEEE ACCESS, 2021, 9 : 87552 - 87562
  • [26] Ensemble Learning With Attention-Based Multiple Instance Pooling for Classification of SPT
    Zhou, Qinghua
    Zhang, Xin
    Zhang, Yu-Dong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (03) : 1927 - 1931
  • [27] AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency
    Yu, Fuxun
    Xu, Zirui
    Liu, Chenchen
    Stamoulis, Dimitrios
    Wang, Di
    Wang, Yanzhi
    Chen, Xiang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4694 - 4707
  • [28] An Attention-Based Architecture for Hierarchical Classification With CNNs
    Pizarro, Ivan
    Nanculef, Ricardo
    Valle, Carlos
    IEEE ACCESS, 2023, 11 : 32972 - 32995
  • [29] A Neural Autoregressive Approach to Attention-based Recognition
    Zheng, Yin
    Zemel, Richard S.
    Zhang, Yu-Jin
    Larochelle, Hugo
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 113 (01) : 67 - 79
  • [30] A Neural Autoregressive Approach to Attention-based Recognition
    Yin Zheng
    Richard S. Zemel
    Yu-Jin Zhang
    Hugo Larochelle
    International Journal of Computer Vision, 2015, 113 : 67 - 79