Bi-Level Orthogonal Multi-Teacher Distillation

被引:0
作者
Gong, Shuyue [1 ]
Wen, Weigang [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Mech Elect & Control Engn, Beijing 100044, Peoples R China
关键词
knowledge distillation; deep learning; convolutional neural networks; teacher-student model; optimization; multi-model learning; soft labeling; supervised learning;
D O I
10.3390/electronics13163345
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-teacher knowledge distillation is a powerful technique that leverages diverse information sources from multiple pre-trained teachers to enhance student model performance. However, existing methods often overlook the challenge of effectively transferring knowledge to weaker student models. To address this limitation, we propose BOMD (Bi-level Optimization for Multi-teacher Distillation), a novel approach that combines bi-level optimization with multiple orthogonal projections. Our method employs orthogonal projections to align teacher feature representations with the student's feature space while preserving structural properties. This alignment is further reinforced through a dedicated feature alignment loss. Additionally, we utilize bi-level optimization to learn optimal weighting factors for combining knowledge from heterogeneous teachers, treating the weights as upper-level variables and the student's parameters as lower-level variables. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and flexibility of BOMD. Our method achieves state-of-the-art performance on the CIFAR-100 benchmark for multi-teacher knowledge distillation across diverse scenarios, consistently outperforming existing approaches. BOMD shows significant improvements for both homogeneous and heterogeneous teacher ensembles, even when distilling to compact student models.
引用
收藏
页数:15
相关论文
共 28 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]  
Chen DF, 2021, AAAI CONF ARTIF INTE, V35, P7028
[3]  
Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
[4]   EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization [J].
Dong, Peijie ;
Li, Lujun ;
Wei, Zimian ;
Niu, Xin ;
Tian, Zhiliang ;
Pan, Hengyue .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :17030-17040
[5]  
Dong PJ, 2024, Arxiv, DOI [arXiv:2402.02105, 10.48550/arXiv.2402.02105, DOI 10.48550/ARXIV.2402.02105]
[6]   DisWOT: Student Architecture Search for Distillation WithOut Training [J].
Dong, Peijie ;
Li, Lujun ;
Wei, Zimian .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :11898-11908
[7]  
Dong PJ, 2022, Arxiv, DOI arXiv:2206.13329
[8]  
Du S, 2020, Advances in Neural Information Processing Systems, V33, P12345
[9]   Efficient Knowledge Distillation from an Ensemble of Teachers [J].
Fukuda, Takashi ;
Suzuki, Masayuki ;
Kurata, Gakuto ;
Thomas, Samuel ;
Cui, Jia ;
Ramabhadran, Bhuvana .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :3697-3701
[10]  
Hinton G, 2015, Arxiv, DOI arXiv:1503.02531