A Sensorimotor Perspective on Contrastive Multiview Visual Representation Learning

被引:0
作者
Laflaquiere, Alban [1 ]
机构
[1] AI Lab, SoftBank Robot Europe, F-75015 Paris, France
关键词
Task analysis; Visualization; Robot sensing systems; Training; Machine learning; Semantics; Deep learning; Artificial perception; contrastive multiview learning; representation learning; sensorimotor; unsupervised learning; CORTEX; EXPERIENCE; MODULATION; TOPOLOGY; AGENTS;
D O I
10.1109/TCDS.2021.3086267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The contrastive multiview visual representation learning (CMVRL) framework has recently gained a lot of traction in the unsupervised representation learning literature. Combining a simple data augmentation strategy and a contrastive learning objective, it has been able to generate representations that compare favorably to their supervised counterparts on common downstream visual tasks. The theoretical understanding of this empirical success is currently an active area of research. In this article, we propose a sensorimotor perspective on the various components of the framework. We show how it can be interpreted as building representations that geometrically embed the stable semantic content that a situated agent experiences on short spatiotemporal scales when actively exploring its environment. We also discuss the relevance of the approach in light of contemporary active, dynamical, and hierarchical theories of perception. Finally, we extrapolate this sensorimotor perspective to outline promising future research directions that could push the state of the art further and help better understand how an autonomous agent could develop useful visual representations in an unsupervised fashion.
引用
收藏
页码:269 / 278
页数:10
相关论文
共 102 条
[1]   Learning to See by Moving [J].
Agrawal, Pulkit ;
Carreira, Joao ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45
[2]  
[Anonymous], 2018, DEEP NETS WHAT HAVE
[3]  
[Anonymous], 2008, The psychology of the child
[4]  
Bachman P, 2019, ADV NEUR IN, V32
[5]  
Barbu A, 2019, ADV NEUR IN, V32
[6]  
Barlow H.B., 1994, LARGE SCALE NEURONAL, P1
[7]   SUMMATION AND INHIBITION IN THE FROGS RETINA [J].
BARLOW, HB .
JOURNAL OF PHYSIOLOGY-LONDON, 1953, 119 (01) :69-88
[8]   SELF-ORGANIZING NEURAL NETWORK THAT DISCOVERS SURFACES IN RANDOM-DOT STEREOGRAMS [J].
BECKER, S ;
HINTON, GE .
NATURE, 1992, 355 (6356) :161-163
[9]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[10]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339