Multilevel Attention-Based Sample Correlations for Knowledge Distillation

被引:59
|
作者
Gou, Jianping [1 ,2 ]
Sun, Liyuan [2 ]
Yu, Baosheng [3 ]
Wan, Shaohua [4 ]
Ou, Weihua [5 ]
Yi, Zhang [6 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Coll Software, Chongqing 400715, Peoples R China
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW 2008, Australia
[4] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[5] Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang 550025, Guizhou, Peoples R China
[6] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Correlation; Computational modeling; Training; Neural networks; Deep learning; Informatics; Knowledge distillation (KD); model compression; relational knowledge; visual recognition;
D O I
10.1109/TII.2022.3209672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.
引用
收藏
页码:7099 / 7109
页数:11
相关论文
共 50 条
  • [1] Effective Online Knowledge Distillation via Attention-Based Model Ensembling
    Borza, Diana-Laura
    Darabant, Adrian Sergiu
    Ileni, Tudor Alexandru
    Marinescu, Alexandru-Ion
    MATHEMATICS, 2022, 10 (22)
  • [2] Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1371 - 1385
  • [3] Attention-Based Knowledge Distillation in Scene Recognition: The Impact of a DCT-Driven Loss
    Lopez-Cifuentes, Alejandro
    Escudero-Vinolo, Marcos
    Bescos, Jesus
    San Miguel, Juan C.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4769 - 4783
  • [4] Enhancing Recommendation Capabilities Using Multi-Head Attention-Based Federated Knowledge Distillation
    Wu, Aming
    Kwon, Young-Woo
    IEEE ACCESS, 2023, 11 : 45850 - 45861
  • [5] Dissolved oxygen prediction in the Taiwan Strait with the attention-based multi-teacher knowledge distillation model
    Chen, Lei
    Lin, Ye
    Guo, Minquan
    Lu, Wenfang
    Li, Xueding
    Zhang, Zhenchang
    OCEAN & COASTAL MANAGEMENT, 2025, 265
  • [6] FusionDTA: attention-based feature polymerizer and knowledge distillation for drug-target binding affinity prediction
    Yuan, Weining
    Chen, Guanxing
    Chen, Calvin Yu-Chian
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [7] Self-knowledge distillation based on dynamic mixed attention
    Tang, Yuan
    Chen, Ying
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
  • [8] An Attention-Based Neural Network Using Human Semantic Knowledge and Its Application to Clickbait Detection
    Wei, Feng
    Uyen Trang Nguyen
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2022, 3 : 217 - 232
  • [9] A Multifunctional Network with Uncertainty Estimation and Attention-Based Knowledge Distillation to Address Practical Challenges in Respiration Rate Estimation
    Rathore, Kapil Singh
    Vijayarangan, Sricharan
    Sp, Preejith
    Sivaprakasam, Mohanasankar
    SENSORS, 2023, 23 (03)
  • [10] Deep Attention-Based Imbalanced Image Classification
    Wang, Lituan
    Zhang, Lei
    Qi, Xiaofeng
    Yi, Zhang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3320 - 3330