Extensions in channel and class dimensions for attention-based knowledge distillation

被引：0

作者：

Zhou, Liangtai ^{[1
]}

Zhang, Weiwei ^{[1
]}

Zhang, Banghui ^{[1
]}

Guo, Yufeng ^{[1
]}

Wang, Junhuang ^{[1
]}

Li, Xiaobin ^{[1
]}

Zhu, Jianqing ^{[1
]}

机构：

[1] Huaqiao Univ, Coll Engn, Chenghua North Rd, Quanzhou 362021, Fujian, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2025年 / 257卷

关键词：

Knowledge distillation; Deep learning; Model compression;

D O I：

10.1016/j.cviu.2025.104359

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As knowledge distillation technology evolves, it has bifurcated into three distinct methodologies: logic-based, feature-based, and attention-based knowledge distillation. Although the principle of attention-based knowledge distillation is more intuitive, its performance lags behind the other two methods. To address this, we systematically analyze the advantages and limitations of traditional attention-based methods. In order to optimize these limitations and explore more effective attention information, we expand attention-based knowledge distillation in the channel and class dimensions, proposing Spatial Attention-based Knowledge Distillation with Channel Attention (SAKD-Channel) and Spatial Attention-based Knowledge Distillation with Class Attention (SAKD-Class). On CIFAR-100, with ResNet8x4 as the student model, SAKD-Channel improves Top-1 validation accuracy by 1.98%, and SAKD-Class improves it by 3.35% compared to traditional distillation methods. On ImageNet, using ResNet18, these two methods improve Top-1 validation accuracy by 0.55% and 0.17%, respectively, over traditional methods. We also conduct extensive experiments to investigate the working mechanisms and application conditions of channel and class dimensions knowledge distillation, providing new theoretical insights for attention-based knowledge transfer.

引用

页数：12

共 62 条

[1] Variational Information Distillation for Knowledge Transfer [J].

Ahn, Sungsoo ;

Hu, Shell Xu ;

Damianou, Andreas ;

Lawrence, Neil D. ;

Dai, Zhenwen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163

[2]

Chen Yonghao, 2024, 2024 IEEE 40th International Conference on Data Engineering (ICDE), P3111, DOI 10.1109/ICDE60146.2024.00241

[3] Class attention network for image recognition [J].

Cheng, Gong ;

Lai, Pujian ;

Gao, Decheng ;

Han, Junwei .

SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (03)

[4] Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture [J].

Chhabra, Anusha ;

Vishwakarma, Dinesh Kumar .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126

[5] MixPath: A Unified Approach for One-shot Neural Architecture Search [J].

Chu, Xiangxiang ;

Lu, Shun ;

Li, Xudong ;

Zhang, Bo .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :5949-5958

[6] KD-DLGAN: Data Limited Image Generation via Knowledge Distillation [J].

Cui, Kaiwen ;

Yu, Yingchen ;

Zhan, Fangneng ;

Liao, Shengcai ;

Lu, Shijian ;

Xing, Eric .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :3872-3882

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] DepGraph: Towards Any Structural Pruning [J].

Fang, Gongfan ;

Ma, Xinyin ;

Song, Mingli ;

Mi, Michael Bi ;

Wang, Xinchao .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :16091-16101

[9] SAKD: Sparse attention knowledge distillation [J].

Guo, Zhen ;

Zhang, Pengzhou ;

Liang, Peng .

IMAGE AND VISION COMPUTING, 2024, 146

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 5 6 7 →