Self-supervised Learning of Contextualized Local Visual Embeddings

被引：0

作者：

Silva, Thalles ^{[1
]}

Pedrini, Helio ^{[1
]}

Rivera, Adin Ramirez ^{[2
]}

机构：

[1] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil

[2] Univ Oslo, Dept Informat, Oslo, Norway

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00025

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multhead self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation. Code: https://github.com/sthalles/CLoVE.

引用

页码：177 / 186

页数：10

共 41 条

[1]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390294

[2]

Asano Yuki Markus, 2019, INT C LEARN REPR ICL

[3]

Bardes Adrien., 2022, ADV NEURAL INF PROCE

[4]

Bardes Adrien, 2021, INT C LEARN REPR ICL

[5]

Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339

[6]

Caron M, 2020, ADV NEUR IN, V33

[7] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[8] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

[9] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[10]

Chen T., 2020, Advances in Neural Information Processing Systems, V33, P22243

← 1 2 3 4 5 →