Extensions in channel and class dimensions for attention-based knowledge distillation

被引:0
作者
Zhou, Liangtai [1 ]
Zhang, Weiwei [1 ]
Zhang, Banghui [1 ]
Guo, Yufeng [1 ]
Wang, Junhuang [1 ]
Li, Xiaobin [1 ]
Zhu, Jianqing [1 ]
机构
[1] Huaqiao Univ, Coll Engn, Chenghua North Rd, Quanzhou 362021, Fujian, Peoples R China
关键词
Knowledge distillation; Deep learning; Model compression;
D O I
10.1016/j.cviu.2025.104359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As knowledge distillation technology evolves, it has bifurcated into three distinct methodologies: logic-based, feature-based, and attention-based knowledge distillation. Although the principle of attention-based knowledge distillation is more intuitive, its performance lags behind the other two methods. To address this, we systematically analyze the advantages and limitations of traditional attention-based methods. In order to optimize these limitations and explore more effective attention information, we expand attention-based knowledge distillation in the channel and class dimensions, proposing Spatial Attention-based Knowledge Distillation with Channel Attention (SAKD-Channel) and Spatial Attention-based Knowledge Distillation with Class Attention (SAKD-Class). On CIFAR-100, with ResNet8x4 as the student model, SAKD-Channel improves Top-1 validation accuracy by 1.98%, and SAKD-Class improves it by 3.35% compared to traditional distillation methods. On ImageNet, using ResNet18, these two methods improve Top-1 validation accuracy by 0.55% and 0.17%, respectively, over traditional methods. We also conduct extensive experiments to investigate the working mechanisms and application conditions of channel and class dimensions knowledge distillation, providing new theoretical insights for attention-based knowledge transfer.
引用
收藏
页数:12
相关论文
共 62 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]  
Chen Yonghao, 2024, 2024 IEEE 40th International Conference on Data Engineering (ICDE), P3111, DOI 10.1109/ICDE60146.2024.00241
[3]   Class attention network for image recognition [J].
Cheng, Gong ;
Lai, Pujian ;
Gao, Decheng ;
Han, Junwei .
SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (03)
[4]   Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture [J].
Chhabra, Anusha ;
Vishwakarma, Dinesh Kumar .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[5]   MixPath: A Unified Approach for One-shot Neural Architecture Search [J].
Chu, Xiangxiang ;
Lu, Shun ;
Li, Xudong ;
Zhang, Bo .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :5949-5958
[6]   KD-DLGAN: Data Limited Image Generation via Knowledge Distillation [J].
Cui, Kaiwen ;
Yu, Yingchen ;
Zhan, Fangneng ;
Liao, Shengcai ;
Lu, Shijian ;
Xing, Eric .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :3872-3882
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   DepGraph: Towards Any Structural Pruning [J].
Fang, Gongfan ;
Ma, Xinyin ;
Song, Mingli ;
Mi, Michael Bi ;
Wang, Xinchao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :16091-16101
[9]   SAKD: Sparse attention knowledge distillation [J].
Guo, Zhen ;
Zhang, Pengzhou ;
Liang, Peng .
IMAGE AND VISION COMPUTING, 2024, 146
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778