Knowledge engineering;
Training;
Feature extraction;
Uncertainty;
Correlation;
Collaboration;
Circuits and systems;
Knowledge distillation;
teacher-student learning;
deep learning;
D O I:
10.1109/TCSVT.2024.3377251
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
Existing research on knowledge distillation has primarily concentrated on the task of facilitating student networks in acquiring the complete knowledge imparted by teacher networks. However, recent studies have shown that good networks are not suitable for acting as teachers, and there is a positive correlation between distillation performance and teacher prediction uncertainty. To address this finding, this paper thoroughly analyzes in depth the reasons why the teacher network affects the distillation performance, gives full play to the participation of the student network in the process of knowledge distillation, and assists the teacher network in distilling the knowledge that is suitable for their learning. In light of this premise, a novel approach known as Collaborative Knowledge Distillation (CKD) is introduced, which is founded upon the concept of "Tailoring the Teaching to the Individual". Compared with Baseline, this paper's method improves students' accuracy by an average of 3.42% in CIFAR-100 experiments, and by an average of 1.71% compared with the classical Knowledge Distillation (KD) method. The ImageNet experiments conducted revealed a significant improvement of 2.04% in the Top-1 accuracy of the students.
机构:
Monash Univ, Fac Informat Technol, Dept Data Sci & Artificial Intelligence, Clayton, Vic, AustraliaJiangsu Univ, Sch Comp Sci & Commun Engn, 301 Xuefu Rd, Zhenjiang 212013, Jiangsu, Peoples R China
Du, Lan
Ou, Weihua
论文数: 0引用数: 0
h-index: 0
机构:
Guizhou Normal Univ, Sch Big Data & Comp Sci, Guiyang, Peoples R ChinaJiangsu Univ, Sch Comp Sci & Commun Engn, 301 Xuefu Rd, Zhenjiang 212013, Jiangsu, Peoples R China