Representation Learning by Learning to Count

被引:233
作者
Noroozi, Mehdi [1 ]
Pirsiavash, Hamed [2 ]
Favaro, Paolo [1 ]
机构
[1] Univ Bern, Bern, Switzerland
[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年
基金
瑞士国家科学基金会;
关键词
D O I
10.1109/ICCV.2017.628
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.
引用
收藏
页码:5899 / 5907
页数:9
相关论文
共 45 条
[1]  
[Anonymous], 2005, CVPR
[2]  
[Anonymous], 2014, ACM MM
[3]  
[Anonymous], 2016, ARXIV161206370
[4]  
[Anonymous], 2016, ECCV
[5]  
[Anonymous], 2008, CVPR
[6]  
[Anonymous], 2017, CVPR
[7]  
[Anonymous], 2016, ICLR
[8]  
[Anonymous], ARXIV160403505V2
[9]  
[Anonymous], PAMI
[10]  
[Anonymous], 2013, ICML