Self-knowledge distillation based on dynamic mixed attention

被引:0
作者
Tang, Yuan [1 ]
Chen, Ying [1 ]
机构
[1] Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi
来源
Kongzhi yu Juece/Control and Decision | 2024年 / 39卷 / 12期
关键词
attention mechanism; background knowledge; deep learning; image classification; knowledge distillation; model compression;
D O I
10.13195/j.kzyjc.2024.0036
中图分类号
学科分类号
摘要
Self-knowledge distillation reduces the necessity of training a large teacher network, whose attention mechanism only focuses on the foreground of the image. It ignores the background knowledge with color and texture information, furthermore may lead to the omission of the foreground information due to the wrong focus of spatial attention. To address the problem, a self-knowledge distillation method based on dynamic mixed attention is proposed, which reasonably exploits both foreground and background information in images and therefore improves the classification accuracy. A mask segmentation module is designed to segment the feature map of background and foreground, which are used to extract the ignored background knowledge and the missing foreground information respectively. Moreover, a knowledge extraction module based on dynamic attention distribution strategy is proposed, which dynamically adjusts the loss ratio of background attention and foreground attention by introducing a parameter based on predictive probability distribution. The strategy guides the cooperation between foreground and background, which leads to more accurate attention map and improves the performance of a classifier network. Experiments show that the proposed method using ResNet 18 and WRN-16-2 improves the accuracy on CIFAR 100 by 2.15 % and 1.54 % respectively. For fine-grained visual recognition tasks, the accuracy on CUB 200 dataset and MIT 67 dataset is improved by 3.51 % and 1.05 % respectively, which makes its performance superior to the state-of-the-arts. © 2024 Northeast University. All rights reserved.
引用
收藏
页码:4099 / 4108
页数:9
相关论文
共 37 条
[1]  
Gou J P, Yu B S, Maybank S J, Et al., Knowledge distillation: A survey, International Journal of Computer Vision, 129, 6, pp. 1789-1819, (2021)
[2]  
Sultana F, Sufian A, Dutta P., Advancements in image classification using convolutional neural network, The 4th International Conference on Research in Computational Intelligence and Communication Networks, pp. 122-129, (2018)
[3]  
Zou Z X, Chen K Y, Shi Z W, Et al., Object detection in 20 years: A survey, Proceedings of the IEEE, 111, 3, pp. 257-276, (2023)
[4]  
Jin C X., Bolt defect image classification of transmission line based on knowledge distillation, (2021)
[5]  
Wang L, Qiao J F, Yang C L, Et al., Pruning algorithm for modular echo state network based on sensitivity analysis, Acta Automatica Sinica, 45, 6, pp. 1136-1145, (2019)
[6]  
Idelbayev Y, Carreira-Perpinan M A., Optimal selection of matrix shape and decomposition scheme for neural network compression, ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3250-3254, (2021)
[7]  
Qin H T, Gong R H, Liu X L, Et al., Binary neural networks: A survey, Pattern Recognition, 105, (2020)
[8]  
Sandler M, Howard A, Zhu M L, Et al., MobileNetV2: Inverted residuals and linear bottlenecks, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, (2018)
[9]  
Hinton G, Vinyals O, Dean J., Distilling the knowledge in a neural network, Computer Science, 14, 7, pp. 38-39, (2015)
[10]  
Pan R D, Kong W J, Qi J., Legal judgment prediction based on pre-training model and knowledge distillation, Control and Decision, 37, 1, pp. 67-76, (2022)