Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

被引:150
作者
Lee, Michelle A. [1 ]
Zhu, Yuke [1 ,2 ]
Zachares, Peter [1 ]
Tan, Matthew [1 ]
Srinivasan, Krishnan [1 ]
Savarese, Silvio [1 ]
Fei-Fei, Li [1 ]
Garg, Animesh [2 ,3 ]
Bohg, Jeannette [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Nvidia Res, Santa Clara, CA 95051 USA
[3] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
关键词
Task analysis; Haptic interfaces; Visualization; Robot sensing systems; Solid modeling; Reinforcement learning; Deep learning in robotics and automation; perception for grasping and manipulation; sensor fusion; sensor-based control; MANIPULATION; ADAPTATION; SYSTEM; GRASP;
D O I
10.1109/TRO.2019.2959445
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.
引用
收藏
页码:582 / 596
页数:15
相关论文
共 85 条
[1]   Adaptation of manipulation skills in physical contact with the environment to reference force profiles [J].
Abu-Dakka, Fares J. ;
Nemec, Bojan ;
Jorgensen, Jimmy A. ;
Savarimuthu, Thiusius R. ;
Kruger, Norbert ;
Ude, Ales .
AUTONOMOUS ROBOTS, 2015, 39 (02) :199-217
[2]  
Agrawal P., 2016, Adv. Neural Inf. Process. Syst, P5092, DOI DOI 10.48550/ARXIV.1606.07419
[3]   Learning dexterous in-hand manipulation [J].
Andrychowicz, Marcin ;
Baker, Bowen ;
Chociej, Maciek ;
Jozefowicz, Rafal ;
McGrew, Bob ;
Pachocki, Jakub ;
Petron, Arthur ;
Plappert, Matthias ;
Powell, Glenn ;
Ray, Alex ;
Schneider, Jonas ;
Sidor, Szymon ;
Tobin, Josh ;
Welinder, Peter ;
Weng, Lilian ;
Zaremba, Wojciech .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20
[4]  
Babaeizadeh Mohammad, 2017, STOCHASTIC VARIATION
[5]  
Bekiroglu Y, 2013, IEEE INT CONF ROBOT, P3040, DOI 10.1109/ICRA.2013.6630999
[6]  
Bekiroglu Y, 2011, IEEE INT C INT ROBOT, P1554, DOI 10.1109/IROS.2011.6048518
[7]   Neural synergy between kinetic vision and touch [J].
Blake, R ;
Sobel, KV ;
James, TW .
PSYCHOLOGICAL SCIENCE, 2004, 15 (06) :397-402
[8]  
Bohg J., 2011, THESIS KTH ROYAL I T
[9]   Interactive Perception: Leveraging Action in Perception and Perception in Action [J].
Bohg, Jeannette ;
Hausman, Karol ;
Sankaran, Bharath ;
Brock, Oliver ;
Kragic, Danica ;
Schaal, Stefan ;
Sukhatme, Gaurav S. .
IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (06) :1273-1291
[10]   Strategies for Multi-Modal Scene Exploration [J].
Bohg, Jeannette ;
Johnson-Roberson, Matthew ;
Bjorkman, Marten ;
Kragic, Danica .
IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, :4509-4515