Pay Attention to Your Positive Pairs: Positive Pair Aware Contrastive Knowledge Distillation

被引:5
作者
Yu, Zhipeng [1 ]
Xu, Qianqian [2 ]
Jiang, Yangbangyan [3 ]
Qin, Haoyu [4 ]
Huang, Qingming [2 ,5 ,6 ,7 ]
机构
[1] UCAS, SEECE, Beijing, Peoples R China
[2] Chinese Acad Sci, ICT, IIP, Beijing, Peoples R China
[3] Chinese Acad Sci, IIE, SKLOIS, Beijing, Peoples R China
[4] SenseTime Grp Ltd, Hong Kong, Peoples R China
[5] UCAS, SCST, Beijing, Peoples R China
[6] Chinese Acad Sci, BDKM, Beijing, Peoples R China
[7] Peng Cheng Lab, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Knowledge Distillation; Neural Networks; Contrastive Learning;
D O I
10.1145/3503161.3548256
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural networks have achieved impressive success on various multimedia applications in the past decades. To reach a higher performance on real-world resource-constrained devices with large models that are already learned, knowledge distillation, which aims at transferring representational knowledge from a large teacher network into a small student network, has attracted increasing attention. Recently, contrastive distillation methods have achieved superior performance in this area, due to the powerful representability brought by contrastive/self-supervised learning. These models often transfer knowledge through individual samples or inter-class relationships, while ignoring the correlation lying among intra-class samples, which convey abundant information. In this paper, we propose a Positive pair Aware Contrastive Knowledge Distillation (PACKD) framework to extend the contrastive distillation with more positive pairs to capture more abundant knowledge from the teacher. Specifically, it pulls together features of pairs from the same class learned by the student and teacher while simultaneously pushing apart those of pairs from different classes. With a positive-pair similarity weighting strategy based on optimal transport, the proposed contrastive objective is able to improve the feature discriminability between positive samples with large visual discrepancies. Experiments on different benchmarks demonstrate the effectiveness of the proposed PACKD.
引用
收藏
页码:5862 / 5870
页数:9
相关论文
共 41 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]   Wasserstein Contrastive Representation Distillation [J].
Chen, Liqun ;
Wang, Dong ;
Gan, Zhe ;
Liu, Jingjing ;
Henao, Ricardo ;
Carin, Lawrence .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16291-16300
[3]  
Coates Adam, 2011, JMLR Workshop and Conference Proceedings, P215
[4]  
Cuturi M, 2013, Advances in Neural Information Processing Systems (NeurIPS), P2292
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Han S., 2016, DEEP COMPRESSION COM
[7]  
Han S, 2015, ADV NEUR IN, V28
[8]   Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression [J].
Hao, Zhiwei ;
Luo, Yong ;
Hu, Han ;
An, Jianping ;
Wen, Yonggang .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1803-1811
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Heo B, 2019, AAAI CONF ARTIF INTE, P3779