HCSC: Hierarchical Contrastive Selective Coding

被引:49
作者
Guo, Yuanfan [1 ,5 ]
Xu, Minghao [2 ,3 ]
Li, Jiawen [4 ]
Ni, Bingbing [1 ]
Zhu, Xuanyu [4 ]
Sun, Zhenbang [4 ]
Xu, Yi [1 ,5 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Mila Quebec AI Inst, Montreal, PQ, Canada
[3] Univ Montreal, Montreal, PQ, Canada
[4] ByteDance, Beijing, Peoples R China
[5] Shanghai Jiao Tong Univ, Chongqing Res Inst, Shanghai, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.00948
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical semantic structures naturally exist in an image dataset, in which several semantically relevant image clusters can be further integrated into a larger cluster with coarser-grained semantics. Capturing such structures with image representations can greatly benefit the semantic understanding on various downstream tasks. Existing contrastive representation learning methods lack such an important model capability. In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC). In this framework, a set of hierarchical prototypes are constructed and also dynamically updated to represent the hierarchical semantic structures underlying the data in the latent space. To make image representations better fit such semantic structures, we employ and further improve conventional instance-wise and prototypical contrastive learning via an elaborate pair selection scheme. This scheme seeks to select more diverse positive pairs with similar semantics and more precise negative pairs with truly distinct semantics. On extensive downstream tasks, we verify the state-of-the-art performance of HCSC and also the effectiveness of major model components. We are continually building a comprehensive model zoo (see supplementary material). Our source code and model weights are available at https://github.com/gyfastas/HCSC.
引用
收藏
页码:9696 / 9705
页数:10
相关论文
共 48 条
[1]  
[Anonymous], 2010, P COMPSTAT2010
[2]  
[Anonymous], 2020, ARXIV200607733
[3]  
Asano Yuki Markus, 2019, Proc. ICLR
[4]  
Cao Yue, 2020, ARXIV200614618
[5]   Unsupervised Pre-Training of Image Features on Non-Curated Data [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Mairal, Julien ;
Joulin, Armand .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2959-2968
[6]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[7]  
Caron Mathilde, 2020, ADV NEURAL INFORM PR
[8]  
Chen Ting, 2020, INT C MACH LEARN
[9]  
Chen Xinlei, 2021, IEEE INT C COMP VIS
[10]  
Chen Xinlei, 2006, IMPROVED BASELINES M