See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion

被引:101
作者
Fazeli, N. [1 ]
Oller, M. [1 ]
Wu, J. [2 ]
Wu, Z. [2 ]
Tenenbaum, J. B. [2 ]
Rodriguez, A. [1 ]
机构
[1] MIT, Dept Mech Engn, Cambridge, MA 02139 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
24;
D O I
10.1126/scirobotics.aav3123
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Humans are able to seamlessly integrate tactile and visual stimuli with their intuitions to explore and execute complex manipulation skills. They not only see but also feel their actions. Most current robotic learning methodologies exploit recent progress in computer vision and deep learning to acquire data-hungry pixel-to-action policies. These methodologies do not exploit intuitive latent structure in physics or tactile signatures. Tactile reasoning is omnipresent in the animal kingdom, yet it is underdeveloped in robotic manipulation. Tactile stimuli are only acquired through invasive interaction, and interpretation of the data stream together with visual stimuli is challenging. Here, we propose a methodology to emulate hierarchical reasoning and multisensory fusion in a robot that learns to play Jenga, a complex game that requires physical interaction to be played effectively. The game mechanics were formulated as a generative process using a temporal hierarchical Bayesian model, with representations for both behavioral archetypes and noisy block states. This model captured descriptive latent structures, and the robot learned probabilistic models of these relationships in force and visual domains through a short exploration phase. Once learned, the robot used this representation to infer block behavior patterns and states as it played the game. Using its inferred beliefs, the robot adjusted its behavior with respect to both its current actions and its game strategy, similar to the way humans play the game. We evaluated the performance of the approach against three standard baselines and show its fidelity on a real-world implementation of the game.
引用
收藏
页数:10
相关论文
共 24 条
[11]  
Gao Y, 2016, IEEE INT CONF ROBOT, P536, DOI 10.1109/ICRA.2016.7487176
[12]  
Gemici MC, 2014, IEEE INT C INT ROBOT, P638, DOI 10.1109/IROS.2014.6942626
[13]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[14]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[15]   Coding and use of tactile signals from the fingertips in object manipulation tasks [J].
Johansson, Roland S. ;
Flanagan, J. Randall .
NATURE REVIEWS NEUROSCIENCE, 2009, 10 (05) :345-359
[16]  
Kloss A., 2017, ARXIV171004102CSRO
[17]   Bayesian integration in sensorimotor learning [J].
Körding, KP ;
Wolpert, DM .
NATURE, 2004, 427 (6971) :244-247
[18]   Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection [J].
Levine, Sergey ;
Pastor, Peter ;
Krizhevsky, Alex ;
Ibarz, Julian ;
Quillen, Deirdre .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2018, 37 (4-5) :421-436
[19]  
Levine S, 2016, J MACH LEARN RES, V17
[20]   Toward Robotic Manipulation [J].
Mason, Matthew T. .
ANNUAL REVIEW OF CONTROL, ROBOTICS, AND AUTONOMOUS SYSTEMS, VOL 1, 2018, 1 :1-28