SAKD: Sparse attention knowledge distillation

被引:3
作者
Guo, Zhen [1 ,2 ]
Zhang, Pengzhou [1 ]
Liang, Peng [2 ]
机构
[1] Commun Univ China, State Key Lab Media Convergence & Commun, Dingfuzhuang East St 1, Beijing 100024, Peoples R China
[2] China Unicom Smart City Res Inst, Shoutinanlu 9, Beijing 100024, Peoples R China
关键词
Knowledge distillation; Attention mechanisms; Sparse attention mechanisms;
D O I
10.1016/j.imavis.2024.105020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning techniques have gained significant interest due to their success in large model scenarios. However, large models often require massive computational resources, which can challenge end devices with limited storage capabilities. Transferring knowledge from big to small models and achieving similar results with limited resources requires further research. Knowledge distillation techniques, which involve using teacher-student models to migrate large model capabilities to small models, have been widely used in model compression and knowledge transfer. In this paper, a novel knowledge distillation approach is proposed, which utilizes the sparse attention mechanism (SAKD). SAKD computes attention using student features as queries and teacher features as key values and performs sparse attention values by random deactivation. Then, this sparse attention value is used to reweight the feature distance of each teacher-student feature pair to avoid negative transfer. Comprehensive experiments demonstrate the effectiveness and generality of our approach. Moreover, our SAKD method outperforms previous state-of-the-art methods on image classification tasks.
引用
收藏
页数:8
相关论文
共 50 条
  • [11] Self-knowledge distillation based on dynamic mixed attention
    Tang, Yuan
    Chen, Ying
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
  • [12] B-AT-KD: Binary attention map knowledge distillation
    Wei, Xing
    Liu, Yuqing
    Li, Jiajia
    Chu, Huiyong
    Zhang, Zichen
    Tan, Feng
    Hu, Pengwei
    NEUROCOMPUTING, 2022, 511 : 299 - 307
  • [13] Boosting the Performance of Lightweight HAR Models with Attention and Knowledge Distillation
    Agac, Sumeyye
    Incel, Ozlem Durmaz
    2024 INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS, IE 2024, 2024, : 1 - 8
  • [14] Fast and Scalable Recommendation Retrieval Model with Mixed Attention and Knowledge Distillation
    Androsov, Dmytro
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 244 - 253
  • [15] Compact Cloud Detection with Bidirectional Self-Attention Knowledge Distillation
    Chai, Yajie
    Fu, Kun
    Sun, Xian
    Diao, Wenhui
    Yan, Zhiyuan
    Feng, Yingchao
    Wang, Lei
    REMOTE SENSING, 2020, 12 (17)
  • [16] SheepNet: Rapid Sheep Face Recognition Based on Attention and Knowledge Distillation
    Shi, Binqin
    Wang, Yaojun
    Jia, Lu
    Wang, Yichen
    Qu, Can
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 : 244 - 258
  • [17] Radar Signal Recognition Method Based on Knowledge Distillation and Attention Map
    Qu Zhiyu
    Li Gen
    Deng Zhian
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (09) : 3170 - 3177
  • [18] Apple Leaf Disease Diagnosis Based on Knowledge Distillation and Attention Mechanism
    Dong, Qin
    Gu, Rongchen
    Chen, Shuting
    Zhu, Jinxin
    IEEE ACCESS, 2024, 12 : 65154 - 65165
  • [19] RA-KD: Random Attention Map Projection for Knowledge Distillation
    Zhang, Linna
    Chen, Yuehui
    Cao, Yi
    Zhao, Yaou
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 587 - 596
  • [20] A Light-Weight CNN for Object Detection with Sparse Model and Knowledge Distillation
    Guo, Jing-Ming
    Yang, Jr-Sheng
    Seshathiri, Sankarasrinivasan
    Wu, Hung-Wei
    ELECTRONICS, 2022, 11 (04)