Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:59
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [31] Radar Signal Recognition Method Based on Knowledge Distillation and Attention Map
    Qu Zhiyu
    Li Gen
    Deng Zhian
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (09) : 3170 - 3177
  • [32] Privacy-Aware Knowledge Distillation Based on Dynamic Sample Selection
    Wang, Ping
    Sun, Liangna
    Li, Fengyin
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 266 - 271
  • [33] Attention-based video streaming
    Dikici, Cagatay
    Bozma, H. Isil
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2010, 25 (10) : 745 - 760
  • [34] SQKT: A Student Attention-Based and Question-Aware Model for Knowledge Tracing
    Xie, Qize
    Wang, Liping
    Song, Peidong
    Lin, Xuemin
    WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 221 - 236
  • [35] Attention-based quantum tomography
    Cha, Peter
    Ginsparg, Paul
    Wu, Felix
    Carrasquilla, Juan
    McMahon, Peter L.
    Kim, Eun-Ah
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2022, 3 (01):
  • [36] Attention-Based Gated Recurrent Unit for Gesture Recognition
    Khodabandelou, Ghazaleh
    Jung, Pyeong-Gook
    Amirat, Yacine
    Mohammed, Samer
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (02) : 495 - 507
  • [37] BadCleaner: Defending Backdoor Attacks in Federated Learning via Attention-Based Multi-Teacher Distillation
    Zhang, Jiale
    Zhu, Chengcheng
    Ge, Chunpeng
    Ma, Chuan
    Zhao, Yanchao
    Sun, Xiaobing
    Chen, Bing
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (05) : 4559 - 4573
  • [38] Spectrum Transformer: An Attention-Based Wideband Spectrum Detector
    Zhang, Weishan
    Wang, Yue
    Chen, Xiang
    Cai, Zhipeng
    Tian, Zhi
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (09) : 12343 - 12353
  • [39] Knowledge Distillation-Based Semantic Communications for Multiple Users
    Liu, Chenguang
    Zhou, Yuxin
    Chen, Yunfei
    Yang, Shuang-Hua
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (07) : 7000 - 7012
  • [40] Efficient Dual-Attention-Based Knowledge Distillation Network for Unsupervised Wafer Map Anomaly Detection
    Hasan, Mohammad Mehedi
    Yu, Naigong
    Mirani, Imran Khan
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2024, 37 (03) : 293 - 303