Unsupervised Learning of Dense Visual Representations

被引:0
作者
Pinheiro, Pedro O. [1 ]
Almahairi, Amjad
Benmalek, Ryan Y. [2 ]
Golemo, Florian [1 ,3 ]
Courville, Aaron [3 ,4 ]
机构
[1] Element AI, Montreal, PQ, Canada
[2] Cornell Univ, Ithaca, NY USA
[3] Univ Montreal, Mila, Montreal, PQ, Canada
[4] CIFAR Fellow, Toronto, ON, Canada
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.
引用
收藏
页数:12
相关论文
共 80 条
  • [1] Bachman P, 2019, ADV NEUR IN, V32
  • [2] Belghazi Mohamed, 2019, NEURIPS
  • [3] Deep Clustering for Unsupervised Learning of Visual Features
    Caron, Mathilde
    Bojanowski, Piotr
    Joulin, Armand
    Douze, Matthijs
    [J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 139 - 156
  • [4] MegDet: A Large Mini-Batch Object Detector
    Peng, Chao
    Xiao, Tete
    Li, Zeming
    Jiang, Yuning
    Zhang, Xiangyu
    Jia, Kai
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6181 - 6189
  • [5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [6] Chen Ting, 2020, ICML
  • [7] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
    Choy, Christopher B.
    Xu, Danfei
    Gwak, Jun Young
    Chen, Kevin
    Savarese, Silvio
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 628 - 644
  • [8] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [9] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848