Multi-level Logit Distillation

被引:38
|
作者
Jin, Ying [1 ]
Wang, Jiaqi [2 ]
Lin, Dahua [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, CUHK Sense Time Joint Lab, Hong Kong, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher model to a lightweight student model. Mainstream KD methods can be divided into two categories, logit distillation, and feature distillation. The former is easy to implement, but inferior in performance, while the latter is not applicable to some practical circumstances due to concerns such as privacy and safety. Towards this dilemma, in this paper, we explore a stronger logit distillation method via making better utilization of logit outputs. Concretely, we propose a simple yet effective approach to logit distillation via multi-level prediction alignment. Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously. In addition, a prediction augmentation mechanism based on model calibration further boosts the performance. Extensive experiment results validate that our method enjoys consistently higher performance than previous logit distillation methods, and even reaches competitive performance with mainstream feature distillation methods. Code is available at https://github.com/Jin-Ying/Multi-Level-Logit-Distillation.
引用
收藏
页码:24276 / 24285
页数:10
相关论文
共 50 条
  • [1] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    NEUROCOMPUTING, 2020, 415 : 106 - 113
  • [2] Adaptive multi-teacher multi-level knowledge distillation
    Liu, Yuang
    Zhang, Wei
    Wang, Jun
    Neurocomputing, 2021, 415 : 106 - 113
  • [3] Progressive multi-level distillation learning for pruning network
    Wang, Ruiqing
    Wan, Shengmin
    Zhang, Wu
    Zhang, Chenlu
    Li, Yu
    Xu, Shaoxiang
    Zhang, Lifu
    Jin, Xiu
    Jiang, Zhaohui
    Rao, Yuan
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (05) : 5779 - 5791
  • [4] Multi-Level Knowledge Distillation with Positional Encoding Enhancement
    Xu, Lixiang
    Wang, Zhiwen
    Bai, Lu
    Ji, Shengwei
    Ai, Bing
    Wang, Xiaofeng
    Yu, Philip S.
    PATTERN RECOGNITION, 2025, 163
  • [5] Progressive multi-level distillation learning for pruning network
    Ruiqing Wang
    Shengmin Wan
    Wu Zhang
    Chenlu Zhang
    Yu Li
    Shaoxiang Xu
    Lifu Zhang
    Xiu Jin
    Zhaohui Jiang
    Yuan Rao
    Complex & Intelligent Systems, 2023, 9 : 5779 - 5791
  • [6] Multiresolution Knowledge Distillation and Multi-level Fusion for Defect Detection
    Xie, Huosheng
    Xiao, Yan
    GREEN, PERVASIVE, AND CLOUD COMPUTING, GPC 2022, 2023, 13744 : 178 - 191
  • [7] USING MULTI-LEVEL DEA MODEL AND VERIFYING ITS RESULTS WITH LOGIT MODEL
    Horvathova, Jarmila
    Mokrisova, Martina
    ECONOMIC AND SOCIAL DEVELOPMENT: 43RD INTERNATIONAL SCIENTIFIC CONFERENCE ON ECONOMIC AND SOCIAL DEVELOPMENT RETHINKING MANAGEMENT IN THE DIGITAL ERA: CHALLENGES FROM INDUSTRY 4.0 TO RETAIL MANAGEMENT, 2019, : 58 - 67
  • [8] Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
    Liu, Yang
    Sun, Haoqin
    Chen, Geng
    Wang, Qingyue
    Zhao, Zhen
    Lu, Xugang
    Wang, Longbiao
    INTERSPEECH 2023, 2023, : 1893 - 1897
  • [9] Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text
    Wu, Qianhui
    Jiang, Huiqiang
    Yin, Haonan
    Karlsson, Borje F.
    Lin, Chin-Yew
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7317 - 7332
  • [10] Multi-level nature of and multi-level approaches to leadership
    Yammarino, Francis J.
    Dansereau, Fred
    LEADERSHIP QUARTERLY, 2008, 19 (02): : 135 - 141