Generous teacher: Good at distilling knowledge for student learning

被引:0
作者
Ding, Yifeng [1 ]
Yang, Gaoming [1 ]
Yin, Shuting [1 ]
Zhang, Ji [2 ]
Fang, Xianjin [1 ]
Yang, Wencheng [2 ]
机构
[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China
[2] Univ Southern Queensland, Sch Math Phys & Comp, Toowoomba 4350, Australia
关键词
Knowledge distillation; Generous teacher; Absorbing distilled knowledge; Decouple logit;
D O I
10.1016/j.imavis.2024.105199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is a technique that aims to transfer valuable knowledge from a large, well-trained model (the teacher) to a lightweight model (the student), with the primary goal of improving the student's performance on a given task. In recent years, mainstream distillation methods have focused on modifying student learning styles, resulting in less attention being paid to the knowledge provided by the teacher. However, upon reexamining the knowledge transferred by the teacher, we find that it still has untapped potential, which is crucial to bridging the performance gap between teachers and students. Therefore, we study knowledge distillation from the teacher's perspective and introduce a novel teacher knowledge enhancement method termed "Generous Teacher." The Generous Teacher is a specially trained teacher model that can provide more valuable knowledge for the student model. This is achieved by integrating a standardly trained teacher (Standard Teacher) to assist in the training process of the Generous Teacher. As a result, the Generous Teacher accomplishes the task at hand and assimilates distilled knowledge from the Standard Teacher, effectively adapting to distillation teaching in advance. Specifically, we recognize that non-target class knowledge plays a crucial role in improving the distillation effect for students. To leverage this, we decouple logit outputs and selectively use the Standard Teacher's non-target class knowledge to enhance the Generous Teacher. By setting the temperature as a multiple of the logit standard deviation, we ensure that the additional knowledge absorbed by the Generous Teacher is more suitable for student distillation. Experimental results on standard benchmarks demonstrate that the Generous Teacher surpasses the Standard Teacher in terms of accuracy when applied to standard knowledge distillation. Furthermore, the Generous Teacher can be seamlessly integrated into existing distillation methods, bringing general improvements at a low additional computational cost. The code will be publicly available at https://github.com/EifelTing/Generous-Teacher.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Student Network Learning via Evolutionary Knowledge Distillation
    Zhang, Kangkai
    Zhang, Chunhui
    Li, Shikun
    Zeng, Dan
    Ge, Shiming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2251 - 2263
  • [32] DISTILLING FACIAL KNOWLEDGE WITH TEACHER-TASKS: SEMANTIC-SEGMENTATION-FEATURES FOR POSE-INVARIANT FACE-RECOGNITION
    Hassani, Ali
    El Shair, Zaid
    Refat, Rafi Ud Duala
    Malik, Hafiz
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 741 - 745
  • [33] Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
    Xiao, Shitao
    Liu, Zheng
    Han, Weihao
    Zhang, Jianjin
    Lian, Defu
    Gong, Yeyun
    Chen, Qi
    Yang, Fan
    Sun, Hao
    Shao, Yingxia
    Xie, Xing
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1513 - 1523
  • [34] Learning From Teacher's Failure: A Reflective Learning Paradigm for Knowledge Distillation
    Xu, Kai
    Wang, Lichun
    Xin, Jianjia
    Li, Shuang
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 384 - 396
  • [35] Collaborative Multiple-Student Single-Teacher for Online Learning
    Zain, Alaa
    Jian, Yang
    Zhou, Jinjia
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 515 - 525
  • [36] Distilling from professors: Enhancing the knowledge distillation of teachers
    Bang, Duhyeon
    Lee, Jongwuk
    Shim, Hyunjung
    [J]. INFORMATION SCIENCES, 2021, 576 : 743 - 755
  • [37] Distilling dynamical knowledge from stochastic reaction networks
    Liu, Chuanbo
    Wang, Jin
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (14)
  • [38] Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation
    Zhang, Yuan
    Xu, Xiaoran
    Zhou, Hanning
    Zhang, Yan
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 735 - 743
  • [39] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
    Futami, Hayato
    Inaguma, Hirofumi
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 3635 - 3639
  • [40] Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
    Avram, Andrei-Marius
    Catrina, Darius
    Cercel, Dumitru-Clementin
    Dascalu, Mihai
    Rebedea, Traian
    Pais, Vasile
    Tufis, Dan
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 374 - 384