Generous teacher: Good at distilling knowledge for student learning

被引:0
|
作者
Ding, Yifeng [1 ]
Yang, Gaoming [1 ]
Yin, Shuting [1 ]
Zhang, Ji [2 ]
Fang, Xianjin [1 ]
Yang, Wencheng [2 ]
机构
[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China
[2] Univ Southern Queensland, Sch Math Phys & Comp, Toowoomba 4350, Australia
关键词
Knowledge distillation; Generous teacher; Absorbing distilled knowledge; Decouple logit;
D O I
10.1016/j.imavis.2024.105199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is a technique that aims to transfer valuable knowledge from a large, well-trained model (the teacher) to a lightweight model (the student), with the primary goal of improving the student's performance on a given task. In recent years, mainstream distillation methods have focused on modifying student learning styles, resulting in less attention being paid to the knowledge provided by the teacher. However, upon reexamining the knowledge transferred by the teacher, we find that it still has untapped potential, which is crucial to bridging the performance gap between teachers and students. Therefore, we study knowledge distillation from the teacher's perspective and introduce a novel teacher knowledge enhancement method termed "Generous Teacher." The Generous Teacher is a specially trained teacher model that can provide more valuable knowledge for the student model. This is achieved by integrating a standardly trained teacher (Standard Teacher) to assist in the training process of the Generous Teacher. As a result, the Generous Teacher accomplishes the task at hand and assimilates distilled knowledge from the Standard Teacher, effectively adapting to distillation teaching in advance. Specifically, we recognize that non-target class knowledge plays a crucial role in improving the distillation effect for students. To leverage this, we decouple logit outputs and selectively use the Standard Teacher's non-target class knowledge to enhance the Generous Teacher. By setting the temperature as a multiple of the logit standard deviation, we ensure that the additional knowledge absorbed by the Generous Teacher is more suitable for student distillation. Experimental results on standard benchmarks demonstrate that the Generous Teacher surpasses the Standard Teacher in terms of accuracy when applied to standard knowledge distillation. Furthermore, the Generous Teacher can be seamlessly integrated into existing distillation methods, bringing general improvements at a low additional computational cost. The code will be publicly available at https://github.com/EifelTing/Generous-Teacher.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Distilling knowledge from Gaussian process teacher to neural network student
    Wong, Jeremy H. M.
    Zhang, Huayun
    Chen, Nancy F.
    INTERSPEECH 2023, 2023, : 426 - 430
  • [2] Distilling Knowledge in Federated Learning
    Le, Huy Q.
    Shin, Jong Hoon
    Nguyen, Minh N. H.
    Hong, Choong Seon
    2021 22ND ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2021, : 196 - 201
  • [3] Distilling the Undistillable: Learning from a Nasty Teacher
    Jandial, Surgan
    Khasbage, Yash
    Pal, Arghya
    Balasubramanian, Vineeth N.
    Krishnamurthy, Balaji
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 587 - 603
  • [4] PrUE: Distilling Knowledge from Sparse Teacher Networks
    Wang, Shaopu
    Chen, Xiaojun
    Kou, Mengzhen
    Shi, Jinqiao
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 102 - 117
  • [5] Distilling a Powerful Student Model via Online Knowledge Distillation
    Li, Shaojie
    Lin, Mingbao
    Wang, Yan
    Wu, Yongjian
    Tian, Yonghong
    Shao, Ling
    Ji, Rongrong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 8743 - 8752
  • [6] Distilling Knowledge Based on Curriculum Learning for Temporal Knowledge Graph Embeddings
    Zhang, Bin
    Li, Jiayin
    Dai, Yuanfei
    PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, : 4248 - 4252
  • [7] Reciprocal Teacher-Student Learning via Forward and Feedback Knowledge Distillation
    Gou, Jianping
    Chen, Yu
    Yu, Baosheng
    Liu, Jinhua
    Du, Lan
    Wan, Shaohua
    Yi, Zhang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7901 - 7916
  • [8] Distilling Knowledge with a Teacher's Multitask Model for Biomedical Named Entity Recognition
    Mehmood, Tahir
    Gerevini, Alfonso E.
    Lavelli, Alberto
    Olivato, Matteo
    Serina, Ivan
    INFORMATION, 2023, 14 (05)
  • [9] Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA
    Pan, Yonghua
    Li, Zechao
    Zhang, Liyan
    Tang, Jinhui
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
  • [10] Single-Head Lifelong Learning Based on Distilling Knowledge
    Wang, Yen-Hsiang
    Lin, Chih-Yang
    Thaipisutikul, Tipajin
    Shih, Timothy K.
    IEEE ACCESS, 2022, 10 : 35469 - 35478