AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data

被引:140
作者
Zhang, Liheng [1 ]
Qi, Guo-Jun [1 ,2 ]
Wang, Liqiang [3 ]
Luo, Jiebo [4 ]
机构
[1] Lab MAchine Percept & LEarning MAPLE, Orlando, FL 32816 USA
[2] Huawei Cloud, Shenzhen, Peoples R China
[3] Univ Cent Florida, Orlando, FL 32816 USA
[4] Univ Rochester, Rochester, NY 14627 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of deep neural networks often relies on a large amount of labeled examples, which can be difficult to obtain in many real scenarios. To address this challenge, unsupervised methods are strongly preferred for training neural networks without using any labeled data. In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. Given a randomly sampled transformation, AET seeks to predict it merely from the encoded features as accurately as possible at the output end. The idea is the following: as long as the unsupervised features successfully encode the essential information about the visual structures of original and transformed images, the transformation can be well predicted. We will show that this AET paradigm allows us to instantiate a large variety of transformations, from parameterized, to non-parameterized and GAN-induced ones. Our experiments show that AET greatly improves over existing unsupervised approaches, setting new state-of-the-art performances being greatly closer to the upper bounds by their fully supervised counterparts on CIFAR-10, ImageNet and Places datasets.
引用
收藏
页码:2542 / 2550
页数:9
相关论文
共 31 条
[1]   Learning to See by Moving [J].
Agrawal, Pulkit ;
Carreira, Joao ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45
[2]  
[Anonymous], 2017, ARXIV170107875
[3]  
[Anonymous], 2016, ARXIV160308511
[4]  
[Anonymous], 2017, ARXIV170106264
[5]  
[Anonymous], 2015, ARXIV151106856
[6]  
Bojanowski Piotr, 2017, ARXIV170405310
[7]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[8]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[9]  
Donahue Jeff, 2016, ADVERSARIAL FEATURE
[10]  
Dosovitskiy Alexey, 2014, NIPS