With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

被引:203
作者
Dwibedi, Debidatta [1 ]
Aytar, Yusuf [2 ]
Tompson, Jonathan [1 ]
Sermanet, Pierre [1 ]
Zisserman, Andrew [2 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] DeepMind, London, England
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we are interested in using positives from other instances in the dataset. Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives. This provides more semantic variations than pre-defined transformations. We find that using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification using ResNet-50 under the linear evaluation protocol, from 71.7% to 75.6%, outperforming previous state-of-the-art methods. On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53.8% to 56.5%. On transfer learning benchmarks our method outperforms state-of-the-art methods (including supervised learning with ImageNet) on 8 out of 12 downstream datasets. Furthermore, we demonstrate empirically that our method is less reliant on complex data augmentations. We see a relative reduction of only 2.1% ImageNet Top-1 accuracy when we train using only random crops.
引用
收藏
页码:9568 / 9577
页数:10
相关论文
共 60 条
[1]  
[Anonymous], 2004, COMP VIS PATT REC WO
[2]  
Asano Y., 2019, INT C LEARN REPR
[3]  
Berg Thomas, 2014, P C COMP VIS PATT RE, P7
[4]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[5]  
Caron M., 2020, ARXIV200609882, P1
[6]   Unsupervised Pre-Training of Image Features on Non-Curated Data [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Mairal, Julien ;
Joulin, Armand .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2959-2968
[7]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[8]  
Chechik Gal, 2010, Large scale online learning of image similarity through ranking
[9]  
Chen T., 2020, P 34 INT C NEUR INF, P22243, DOI 10.5555/3495724.3497589
[10]   Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation [J].
Chen, Xiaocong ;
Huang, Chaoran ;
Yao, Lina ;
Wang, Xianzhi ;
Liu, Wei ;
Zhang, Wenjie .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,