Augmented Knowledge Distillation via Contrastive Learning

被引:0
作者
Xu, Jianhua [1 ]
Li, Lin [1 ]
Gou, Jianping [2 ]
Du, Lan [3 ]
Zhan, Yibing [4 ]
机构
[1] Jiangsu Univ, Zhenjiang, Jiangsu, Peoples R China
[2] Southwest Univ, Chongqing, Peoples R China
[3] Monash Univ, Clayton, Vic, Australia
[4] JD Explore Acad, Beijing, Peoples R China
来源
COMPUTER ANIMATION AND SOCIAL AGENTS, CASA 2024, PT II | 2025年 / 2375卷
关键词
Model compression; knowledge distillation; contrastive learning; vision recognition;
D O I
10.1007/978-981-96-2684-7_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying computer animation and social agent models on devices with limited computing resources presents a significant challenge. Knowledge distillation (KD) emerges as an effective model compression technique, harnessing the extensive knowledge of a large teacher model to facilitate the training of a smaller student model. However, existing KD methodologies predominantly concentrate on transferring task-specific knowledge from supervised tasks, such as logit and feature, over-looking the valuable insights into cross-sample discrepancy inherent in teacher and student models. In response, we propose a novel KD approach, termed augmented knowledge distillation via contrastive learning (CAKD). Initially, in the supervision task, we enhance vanilla KD by integrating logit and feature outputs derived from both the original and the augmented data. Subsequently, in the self-supervision task, we identify pivotal sample pairs and delineate the inter-sample multi-discrepancy relationships using the intrinsic data structure, thus obviating the need for external labels or supervision. This enables knowledge transfer through contrastive learning. The fusion of knowledge from both tasks synergistically enhances student performance. Experimental assessments conducted on two publicly available datasets demonstrate that CAKD surpasses state-of-the-art knowledge distillation methodologies.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 24 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]   Distill on the Go: Online knowledge distillation in self-supervised learning [J].
Bhat, Prashant ;
Arani, Elahe ;
Zonooz, Bahram .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :2672-2681
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]   Multilevel Attention-Based Sample Correlations for Knowledge Distillation [J].
Gou, Jianping ;
Sun, Liyuan ;
Yu, Baosheng ;
Wan, Shaohua ;
Ou, Weihua ;
Yi, Zhang .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) :7099-7109
[6]   Knowledge Distillation: A Survey [J].
Gou, Jianping ;
Yu, Baosheng ;
Maybank, Stephen J. ;
Tao, Dacheng .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1789-1819
[7]   Class Attention Transfer Based Knowledge Distillation [J].
Guo, Ziyao ;
Yan, Haonan ;
Li, Hui ;
Lin, Xiaodong .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :11868-11877
[8]   The problem of overfitting [J].
Hawkins, DM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :1-12
[9]  
Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]
[10]  
Krizhevsky A., 2009, Learning multiple layers of features from tiny images