Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:7
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [41] Student Network Learning via Evolutionary Knowledge Distillation
    Zhang, Kangkai
    Zhang, Chunhui
    Li, Shikun
    Zeng, Dan
    Ge, Shiming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2251 - 2263
  • [42] Dual cross knowledge distillation for image super-resolution
    Fang, Hangxiang
    Long, Yongwen
    Hu, Xinyi
    Ou, Yangtao
    Huang, Yuanjia
    Hu, Haoji
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [43] Recurrent Network Knowledge Distillation for Image Rain Removal
    Su, Zhipeng
    Zhang, Yixiong
    Shi, Jianghong
    Zhang, Xiao-Ping
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1642 - 1653
  • [44] MULTICHANNEL ASR WITH KNOWLEDGE DISTILLATION AND GENERALIZED CROSS CORRELATION FEATURE
    Li, Wenjie
    Zhang, Yu
    Zhang, Pengyuan
    Ge, Fengpei
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 463 - 469
  • [45] Knowledge Distillation for Face Photo-Sketch Synthesis
    Zhu, Mingrui
    Li, Jie
    Wang, Nannan
    Gao, Xinbo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 893 - 906
  • [46] Lightweight Model Pre-Training via Language Guided Knowledge Distillation
    Li, Mingsheng
    Zhang, Lin
    Zhu, Mingzhen
    Huang, Zilong
    Yu, Gang
    Fan, Jiayuan
    Chen, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10720 - 10730
  • [47] SiamOHOT: A Lightweight Dual Siamese Network for Onboard Hyperspectral Object Tracking via Joint Spatial-Spectral Knowledge Distillation
    Sun, Chen
    Wang, Xinyu
    Liu, Zhenqi
    Wan, Yuting
    Zhang, Liangpei
    Zhong, Yanfei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [48] Dual Knowledge Distillation on Multiview Pseudo Labels for Unsupervised Person Re-Identification
    Zhu, Wenjie
    Peng, Bo
    Yan, Wei Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7359 - 7371
  • [49] Heterogeneous Knowledge Distillation for Anomaly Detection
    Wu, Longjiang
    Zhou, Jiali
    IEEE ACCESS, 2024, 12 : 161490 - 161499
  • [50] Spot-Adaptive Knowledge Distillation
    Song, Jie
    Chen, Ying
    Ye, Jingwen
    Song, Mingli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3359 - 3370