Learning continuation: Integrating past knowledge for contrastive distillation

被引：1

作者：

Zhang, Bowen ^{[1
]}

Qin, Jiaohua ^{[1
]}

Xiang, Xuyu ^{[1
]}

Tan, Yun ^{[1
]}

机构：

[1] Cent South Univ Forestry & Technol, Coll Comp Sci & Informat Technol, Changsha 410004, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 304卷

基金：

中国国家自然科学基金;

关键词：

Knowledge distillation; Historical knowledge; Logit knowledge; Contrastive learning;

D O I：

10.1016/j.knosys.2024.112573

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current knowledge distillation methods typically transfer knowledge on data from the same batch. Nevertheless, these methodologies neglect the significance of leveraging the knowledge accumulated by the teacher model from past batches. In response to this challenge, this paper integrates the historical knowledge of the teacher network, aiming to enable the student network to acquire additional knowledge beyond the current batch. The proposed approach named PKCD includes Cross-Batch Contrastive Distillation(CBCD) and Intra-Class Distillation(ICD). Cross-Batch Contrastive Distillation(CBCD) leverages the logits of the teacher network in the previous batches to construct more positive and negative samples across batches based on contrastive learning. This enables the student network to acquire shared information among samples of the same category and distinguish differences among samples of distinct categories. At the same time, the intra-class relationship is considered during the Cross-batch Contrastive distillation(CBCD), and the Intra-Class Distillation(ICD) is carried out to prevent the samples of the same category from being too close to each other, so as to improve the effect of distillation. By integrating historical knowledge, we can leverage the historical knowledge of the teacher network and further improve the effect of knowledge distillation. Through feature transfer experiments,we validate the efficacy of this approach in facilitating knowledge transfer and enhancing the generalization capabilities of student models, and provides a new idea for further research on knowledge distillation based on logits. In contrast to existing logit-based approaches, PKCD demonstrates comparable or superior performance on CIFAR-100 and ImageNet datasets.

引用

页数：12

共 39 条

[1]

[Anonymous], 2009, Cifar-10

[2]

Chen GB, 2017, ADV NEUR IN, V30

[3] Wasserstein Contrastive Representation Distillation [J].

Chen, Liqun ;

Wang, Dong ;

Gan, Zhe ;

Liu, Jingjing ;

Henao, Ricardo ;

Carin, Lawrence .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16291-16300

[4] Distilling Knowledge via Knowledge Review [J].

Chen, Pengguang ;

Liu, Shu ;

Zhao, Hengshuang ;

Jia, Jiaya .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5006-5015

[5]

Chen T., 2020, INT C MACHINE LEARNI, P1597

[6]

Chen T, 2020, PR MACH LEARN RES, V119

[7]

Coates A, 2011, INT C ART INT STAT, P215

[8]

Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, 10.48550/arXiv.1704.04861]

[9] Understanding Deep Learning Techniques for Image Segmentation [J].

Ghosh, Swarnendu ;

Das, Nibaran ;

Das, Ishita ;

Maulik, Ujjwal .

ACM COMPUTING SURVEYS, 2019, 52 (04)

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 →