Unsupervised Learning of Dense Visual Representations

被引：0

作者：

Pinheiro, Pedro O. ^{[1
]}

Almahairi, Amjad

Benmalek, Ryan Y. ^{[2
]}

Golemo, Florian ^{[1
,3
]}

Courville, Aaron ^{[3
,4
]}

机构：

[1] Element AI, Montreal, PQ, Canada

[2] Cornell Univ, Ithaca, NY USA

[3] Univ Montreal, Mila, Montreal, PQ, Canada

[4] CIFAR Fellow, Toronto, ON, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.

引用

页数：12

共 80 条

[1] Bachman P, 2019, ADV NEUR IN, V32
[2] Belghazi Mohamed, 2019, NEURIPS
[3] Deep Clustering for Unsupervised Learning of Visual Features
Caron, Mathilde
Bojanowski, Piotr
Joulin, Armand
Douze, Matthijs
[J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 139 - 156
[4] MegDet: A Large Mini-Batch Object Detector
Peng, Chao
Xiao, Tete
Li, Zeming
Jiang, Yuning
Zhang, Xiangyu
Jia, Kai
Yu, Gang
Sun, Jian
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6181 - 6189
[5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[6] Chen Ting, 2020, ICML
[7] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Choy, Christopher B.
Xu, Danfei
Gwak, Jun Young
Chen, Kevin
Savarese, Silvio
[J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 628 - 644
[8] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[9] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 7 8 →