Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:59
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [41] Attention-based Sentiment Reasoner for aspect-based sentiment analysis
    Liu, Ning
    Shen, Bo
    Zhang, Zhenjiang
    Zhang, Zhiyuan
    Mi, Kun
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2019, 9 (01)
  • [42] On the Instability of Softmax Attention-Based Deep Learning Models in Side-Channel Analysis
    Hajra, Suvadeep
    Alam, Manaar
    Saha, Sayandeep
    Picek, Stjepan
    Mukhopadhyay, Debdeep
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 514 - 528
  • [43] Attention-based Clinical Note Summarization
    Kanwal, Neel
    Rizzo, Giuseppe
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 813 - 820
  • [44] Attention-Based Deep Neural Network Behavioral Model for Wideband Wireless Power Amplifiers
    Liu, Zhijun
    Hu, Xin
    Liu, Ting
    Li, Xiuhua
    Wang, Weidong
    Ghannouchi, Fadhel M.
    IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS, 2020, 30 (01) : 82 - 85
  • [45] Attention-based exploitation and exploration strategy for multi-hop knowledge graph reasoning
    Shang, Bin
    Zhao, Yinliang
    Liu, Yifan
    Wang, Chenxin
    INFORMATION SCIENCES, 2024, 653
  • [46] LKD-STNN: A Lightweight Malicious Traffic Detection Method for Internet of Things Based on Knowledge Distillation
    Zhu, Shizhou
    Xu, Xiaolong
    Zhao, Juan
    Xiao, Fu
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 6438 - 6453
  • [47] Generalized attention-based deep multi-instance learning
    Zhao, Lu
    Yuan, Liming
    Hao, Kun
    Wen, Xianbin
    MULTIMEDIA SYSTEMS, 2023, 29 (01) : 275 - 287
  • [48] Attention-Based Multiscale Sequential Network for PolSAR Image Classification
    Hua, Wenqiang
    Wang, Xinlei
    Zhang, Cong
    Jin, Xiaomin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [49] Attention-based Text Recognition in the Wild
    Yan, Zhi-Chen
    Yu, Stephanie A.
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 42 - 49
  • [50] Attention-Based Neural Tag Recommendation
    Yuan, Jiahao
    Jin, Yuanyuan
    Liu, Wenyan
    Wang, Xiaoling
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 350 - 365