SAKD: Sparse attention knowledge distillation

被引:4
作者
Guo, Zhen [1 ,2 ]
Zhang, Pengzhou [1 ]
Liang, Peng [2 ]
机构
[1] Commun Univ China, State Key Lab Media Convergence & Commun, Dingfuzhuang East St 1, Beijing 100024, Peoples R China
[2] China Unicom Smart City Res Inst, Shoutinanlu 9, Beijing 100024, Peoples R China
关键词
Knowledge distillation; Attention mechanisms; Sparse attention mechanisms;
D O I
10.1016/j.imavis.2024.105020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning techniques have gained significant interest due to their success in large model scenarios. However, large models often require massive computational resources, which can challenge end devices with limited storage capabilities. Transferring knowledge from big to small models and achieving similar results with limited resources requires further research. Knowledge distillation techniques, which involve using teacher-student models to migrate large model capabilities to small models, have been widely used in model compression and knowledge transfer. In this paper, a novel knowledge distillation approach is proposed, which utilizes the sparse attention mechanism (SAKD). SAKD computes attention using student features as queries and teacher features as key values and performs sparse attention values by random deactivation. Then, this sparse attention value is used to reweight the feature distance of each teacher-student feature pair to avoid negative transfer. Comprehensive experiments demonstrate the effectiveness and generality of our approach. Moreover, our SAKD method outperforms previous state-of-the-art methods on image classification tasks.
引用
收藏
页数:8
相关论文
共 50 条
[41]   Masked face recognition based on knowledge distillation and convolutional self-attention network [J].
Wan, Weiguo ;
Wen, Runlin ;
Yao, Li ;
Yang, Yong .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (04) :2269-2284
[42]   SEMANTIC IMAGES SEGMENTATION FOR AUTONOMOUS DRIVING USING SELF-ATTENTION KNOWLEDGE DISTILLATION [J].
Karine, Ayoub ;
Napoleon, Thibault ;
Jridi, Maher .
2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, :198-202
[43]   A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation [J].
Wang, Kang ;
Yang, Feng ;
Chen, Zhibo ;
Chen, Yixin ;
Zhang, Ying .
ANIMALS, 2023, 13 (02)
[44]   Scale fusion light CNN for hyperspectral face recognition with knowledge distillation and attention mechanism [J].
Jie-Yi Niu ;
Zhi-Hua Xie ;
Yi Li ;
Si-Jia Cheng ;
Jia-Wei Fan .
Applied Intelligence, 2022, 52 :6181-6195
[45]   Multilevel attention imitation knowledge distillation for RGB-thermal transmission line detection [J].
Guo, Xiaodong ;
Zhou, Wujie ;
Liu, Tong .
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 260
[46]   Enhancing text generation from knowledge graphs with cross-structure attention distillation [J].
Shi, Xiayang ;
Xia, Zhenlin ;
Cheng, Pei ;
Li, Yinlin .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
[47]   A KNOWLEDGE DISTILLATION METHOD BASED ON IQE ATTENTION MECHANISM FOR TARGET RECOGNITION IN SAR IMAGERY [J].
Wang, Jielei ;
Jiang, Ting ;
Cui, Zongyong ;
Cao, Zongjie ;
Cao, Changjie .
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, :1043-1046
[48]   Pay Attention to Your Positive Pairs: Positive Pair Aware Contrastive Knowledge Distillation [J].
Yu, Zhipeng ;
Xu, Qianqian ;
Jiang, Yangbangyan ;
Qin, Haoyu ;
Huang, Qingming .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :5862-5870
[49]   Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation [J].
Tang, Yinan ;
Guo, Zhenhua ;
Wang, Li ;
Fan, Baoyu ;
Cao, Fang ;
Gao, Kai ;
Zhang, Hongwei ;
Li, Rengang .
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 2024, :23-31
[50]   Weighted Knowledge Based Knowledge Distillation [J].
Kang S. ;
Seo K. .
Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (02) :431-435