Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation

被引:7
|
作者
Ding, Fei [1 ]
Yang, Yin [1 ]
Hu, Hongxin [2 ]
Krovi, Venkat [3 ,4 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Buffalo State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
[3] Clemson Univ, Dept Automot Engn, Clemson, SC 29634 USA
[4] Clemson Univ, Dept Mech Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Correlation; Knowledge engineering; Task analysis; Standards; Network architecture; Prototypes; Training; Convolutional neural networks; dual-level knowledge; knowledge distillation (KD); representation learning; teacher-student model;
D O I
10.1109/TNNLS.2022.3190166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.
引用
收藏
页码:2425 / 2435
页数:11
相关论文
共 50 条
  • [21] Decoupled Knowledge Distillation via Spatial Feature Blurring for Hyperspectral Image Classification
    Xie, Wen
    Zhang, Zhezhe
    Jiao, Licheng
    Wang, Jin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 8938 - 8955
  • [22] Difficulty level-based knowledge distillation
    Ham, Gyeongdo
    Cho, Yucheol
    Lee, Jae-Hyeok
    Kang, Minchan
    Choi, Gyuwon
    Kim, Daeshik
    NEUROCOMPUTING, 2024, 606
  • [23] Structured Knowledge Distillation for Dense Prediction
    Liu, Yifan
    Shu, Changyong
    Wang, Jingdong
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7035 - 7049
  • [24] Unpaired Multi-Modal Segmentation via Knowledge Distillation
    Dou, Qi
    Liu, Quande
    Heng, Pheng Ann
    Glocker, Ben
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (07) : 2415 - 2425
  • [25] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570
  • [26] Robust Semantic Segmentation With Multi-Teacher Knowledge Distillation
    Amirkhani, Abdollah
    Khosravian, Amir
    Masih-Tehrani, Masoud
    Kashiani, Hossein
    IEEE ACCESS, 2021, 9 : 119049 - 119066
  • [27] Distilling a Powerful Student Model via Online Knowledge Distillation
    Li, Shaojie
    Lin, Mingbao
    Wang, Yan
    Wu, Yongjian
    Tian, Yonghong
    Shao, Ling
    Ji, Rongrong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 8743 - 8752
  • [28] Generalized Knowledge Distillation via Relationship Matching
    Ye, Han-Jia
    Lu, Su
    Zhan, De-Chuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1817 - 1834
  • [29] Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation
    Wang, Linqin
    Huang, Xiang
    Yu, Zhengtao
    Peng, Hao
    Gao, Shengxiang
    Mao, Cunli
    Huang, Yuxin
    Dong, Ling
    Yu, Philip S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4631 - 4646
  • [30] Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification
    Zhang, Quanshi
    Cheng, Xu
    Chen, Yilan
    Rao, Zhefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 5099 - 5113