ClusterFit: Improving Generalization of Visual Representations

被引:60
作者
Yan, Xueting [1 ]
Misra, Ishan [1 ]
Gupta, Abhinav [1 ]
Ghadiyaram, Deepti [1 ]
Mahajan, Dhruv [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.00654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pretrained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pretraining frameworks - weak- and self-supervised, modalities - images and videos, and pre-training tasks - object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that CF significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.
引用
收藏
页码:6508 / 6517
页数:10
相关论文
共 64 条
[11]   Look, Listen and Learn [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :609-617
[12]  
Ba Jimmy, 2014, I N ADV NEURAL INFOR, P2654
[13]  
Bojanowski P., 2017, PR MACH LEARN RES, P517
[14]   Unsupervised Pre-Training of Image Features on Non-Curated Data [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Mairal, Julien ;
Joulin, Armand .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2959-2968
[15]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[16]   User Conditional Hashtag Prediction for Images [J].
Denton, Emily ;
Weston, Jason ;
Paluri, Manohar ;
Bourdev, Lubomir ;
Fergus, Rob .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1731-1740
[17]  
desaVirginia R, 1994, NIPS
[18]  
Deshpande Aditya, 2017, ICCV
[19]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[20]   Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Springenberg, Jost Tobias ;
Riedmiller, Martin ;
Brox, Thomas .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (09) :1734-1747