Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:7
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [31] Knowledge Distillation Hashing for Occluded Face Retrieval
    Yang, Yuxiang
    Tian, Xing
    Ng, Wing W. Y.
    Gao, Ying
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9096 - 9107
  • [32] Knowledge Distillation Classifier Generation Network for Zero-Shot Learning
    Yu, Yunlong
    Li, Bin
    Ji, Zhong
    Han, Jungong
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 3183 - 3194
  • [33] Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
    Yang, Chuanguang
    An, Zhulin
    Cai, Linhang
    Xu, Yongjun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2094 - 2108
  • [34] Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification
    Tripathi, Achyut Mani
    Pandey, Om Jee
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1100 - 1113
  • [35] ResKD: Residual-Guided Knowledge Distillation
    Li, Xuewei
    Li, Songyuan
    Omar, Bourahla
    Wu, Fei
    Li, Xi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4735 - 4746
  • [36] Deep Generative Knowledge Distillation by Likelihood Finetuning
    Li, Jingru
    Chen, Xiaofeng
    Zheng, Peiyu
    Wang, Qiang
    Yu, Zhi
    IEEE ACCESS, 2023, 11 : 46441 - 46453
  • [37] Stereo Confidence Estimation via Locally Adaptive Fusion and Knowledge Distillation
    Kim, Sunok
    Kim, Seungryong
    Min, Dongbo
    Frossard, Pascal
    Sohn, Kwanghoon
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6372 - 6385
  • [38] Specific Expert Learning: Enriching Ensemble Diversity via Knowledge Distillation
    Kao, Wei-Cheng
    Xie, Hong-Xia
    Lin, Chih-Yang
    Cheng, Wen-Huang
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (04) : 2494 - 2505
  • [39] Weighted Knowledge Based Knowledge Distillation
    Kang S.
    Seo K.
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (02): : 431 - 435
  • [40] Dual-Level Contrastive Learning for Improving Conciseness of Summarization
    Peng, Wei
    Zhang, Han
    Jiang, Dan
    Xiao, Kejing
    Li, Yuxuan
    IEEE ACCESS, 2024, 12 : 65630 - 65639