Generous teacher: Good at distilling knowledge for student learning

被引:0
作者
Ding, Yifeng [1 ]
Yang, Gaoming [1 ]
Yin, Shuting [1 ]
Zhang, Ji [2 ]
Fang, Xianjin [1 ]
Yang, Wencheng [2 ]
机构
[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China
[2] Univ Southern Queensland, Sch Math Phys & Comp, Toowoomba 4350, Australia
关键词
Knowledge distillation; Generous teacher; Absorbing distilled knowledge; Decouple logit;
D O I
10.1016/j.imavis.2024.105199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is a technique that aims to transfer valuable knowledge from a large, well-trained model (the teacher) to a lightweight model (the student), with the primary goal of improving the student's performance on a given task. In recent years, mainstream distillation methods have focused on modifying student learning styles, resulting in less attention being paid to the knowledge provided by the teacher. However, upon reexamining the knowledge transferred by the teacher, we find that it still has untapped potential, which is crucial to bridging the performance gap between teachers and students. Therefore, we study knowledge distillation from the teacher's perspective and introduce a novel teacher knowledge enhancement method termed "Generous Teacher." The Generous Teacher is a specially trained teacher model that can provide more valuable knowledge for the student model. This is achieved by integrating a standardly trained teacher (Standard Teacher) to assist in the training process of the Generous Teacher. As a result, the Generous Teacher accomplishes the task at hand and assimilates distilled knowledge from the Standard Teacher, effectively adapting to distillation teaching in advance. Specifically, we recognize that non-target class knowledge plays a crucial role in improving the distillation effect for students. To leverage this, we decouple logit outputs and selectively use the Standard Teacher's non-target class knowledge to enhance the Generous Teacher. By setting the temperature as a multiple of the logit standard deviation, we ensure that the additional knowledge absorbed by the Generous Teacher is more suitable for student distillation. Experimental results on standard benchmarks demonstrate that the Generous Teacher surpasses the Standard Teacher in terms of accuracy when applied to standard knowledge distillation. Furthermore, the Generous Teacher can be seamlessly integrated into existing distillation methods, bringing general improvements at a low additional computational cost. The code will be publicly available at https://github.com/EifelTing/Generous-Teacher.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Dual knowledge distillation for visual tracking with teacher-student network
    Wang, Yuanyun
    Sun, Chuanyu
    Wang, Jun
    Chai, Bingfei
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5203 - 5211
  • [22] Teacher-Student Knowledge Distillation for Radar Perception on Embedded Accelerators
    Shaw, Steven
    Tyagi, Kanishka
    Zhang, Shan
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1035 - 1038
  • [23] Comprehensive learning and adaptive teaching: Distilling multi-modal knowledge for pathological glioma grading
    Xing, Xiaohan
    Zhu, Meilu
    Chen, Zhen
    Yuan, Yixuan
    MEDICAL IMAGE ANALYSIS, 2024, 91
  • [24] Domain Adaptation for Food Intake Classification With Teacher/Student Learning
    Turan, M. A. Tugtekin
    Erzin, Engin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4220 - 4231
  • [25] Learning face super-resolution through identity features and distilling facial prior knowledge
    Tomara, Anurag Singh
    Arya, K. V.
    Rajput, Shyam Singh
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
  • [26] Distilling the Knowledge in Object Detection with Adaptive Balance
    Lu, Hongyun
    Liu, Zhi
    Zhang, Mengmeng
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 272 - 275
  • [27] Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition
    Jang, Jaeyeon
    Kim, Chang Ouk
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 767 - 780
  • [28] Knowledge in attention assistant for improving generalization in deep teacher-student models
    Morabbi, Sajedeh
    Soltanizadeh, Hadi
    Mozaffari, Saeed
    Fadaeieslam, Mohammad Javad
    Sana, Shib Sankar
    INTERNATIONAL JOURNAL OF MODELLING AND SIMULATION, 2024,
  • [29] Teacher-student knowledge distillation for real-time correlation tracking
    Chen, Qihuang
    Zhong, Bineng
    Liang, Qihua
    Deng, Qingyong
    Li, Xianxian
    NEUROCOMPUTING, 2022, 500 : 537 - 546
  • [30] Student Network Learning via Evolutionary Knowledge Distillation
    Zhang, Kangkai
    Zhang, Chunhui
    Li, Shikun
    Zeng, Dan
    Ge, Shiming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2251 - 2263