Self-supervised Learning of Contextualized Local Visual Embeddings

被引:0
作者
Silva, Thalles [1 ]
Pedrini, Helio [1 ]
Rivera, Adin Ramirez [2 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil
[2] Univ Oslo, Dept Informat, Oslo, Norway
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年
关键词
D O I
10.1109/ICCVW60793.2023.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multhead self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation. Code: https://github.com/sthalles/CLoVE.
引用
收藏
页码:177 / 186
页数:10
相关论文
共 41 条
[1]  
[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390294
[2]  
Asano Yuki Markus, 2019, INT C LEARN REPR ICL
[3]  
Bardes Adrien., 2022, ADV NEURAL INF PROCE
[4]  
Bardes Adrien, 2021, INT C LEARN REPR ICL
[5]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[6]  
Caron M, 2020, ADV NEUR IN, V33
[7]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[8]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[9]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[10]  
Chen T., 2020, Advances in Neural Information Processing Systems, V33, P22243