Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

被引:0
|
作者
Yin, Zimo [1 ]
Pu, Jian [2 ]
Zhou, Yijie [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Fudan Univ, Inst Sci & Technol BrainInspired Intelligence, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Computational modeling; Computer architecture; Vectors; Knowledge transfer; Cluster-based regularization; iterative prediction refinement; model-agnostic framework; self-knowledge distillation (SKD); two-stage knowledge transfer;
D O I
10.1109/JAS.2024.124629
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.
引用
收藏
页码:2270 / 2283
页数:14
相关论文
共 50 条
  • [1] Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation
    Zimo Yin
    Jian Pu
    Yijie Zhou
    Xiangyang Xue
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (11) : 2270 - 2283
  • [2] Self-knowledge distillation with dimensional history knowledge
    Wenke Huang
    Mang Ye
    Zekun Shi
    He Li
    Bo Du
    Science China Information Sciences, 2025, 68 (9)
  • [3] Neighbor self-knowledge distillation
    Liang, Peng
    Zhang, Weiwei
    Wang, Junhuang
    Guo, Yufeng
    INFORMATION SCIENCES, 2024, 654
  • [4] Self-knowledge distillation based on knowledge transfer from soft to hard examples
    Tang, Yuan
    Chen, Ying
    Xie, Linbo
    IMAGE AND VISION COMPUTING, 2023, 135
  • [5] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [6] Dual teachers for self-knowledge distillation
    Li, Zheng
    Li, Xiang
    Yang, Lingfeng
    Song, Renjie
    Yang, Jian
    Pan, Zhigeng
    PATTERN RECOGNITION, 2024, 151
  • [7] From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels
    Yang, Zhendong
    Zeng, Ailing
    Li, Zhe
    Zhang, Tianke
    Yuan, Chun
    Li, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17139 - 17148
  • [8] Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition
    Duc-Quang Vu
    Le, Ngan
    Wang, Jia-Ching
    IEEE ACCESS, 2021, 9 : 105711 - 105723
  • [9] Sliding Cross Entropy for Self-Knowledge Distillation
    Lee, Hanbeen
    Kim, Jeongho
    Woo, Simon S.
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1044 - 1053
  • [10] Self-Knowledge Distillation with Progressive Refinement of Targets
    Kim, Kyungyul
    Ji, ByeongMoon
    Yoon, Doyoung
    Hwang, Sangheum
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556