Correct Me If I am Wrong: Interactive Learning for Robotic Manipulation

被引:18
作者
Chisari, Eugenio [1 ]
Welschehold, Tim [1 ]
Boedecker, Joschka [1 ]
Burgard, Wolfram [1 ]
Valada, Abhinav [1 ]
机构
[1] Univ Freiburg, Dept Comp Sci, D-70173 Breisgau, Baden Wurttembe, Germany
关键词
Robots; Trajectory; Training; Task analysis; Visualization; Computer architecture; Reinforcement learning; Deep learning in grasping and manipulation; human factors and human-in-the-loop; imitation learning; interactive learning; machine learning for robot control;
D O I
10.1109/LRA.2022.3145516
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Learning to solve complex manipulation tasks from visual observations is a dominant challenge for real-world robot learning. Although deep reinforcement learning algorithms have recently demonstrated impressive results in this context, they still require an impractical amount of time-consuming trial-and-error iterations. In this work, we consider the promising alternative paradigm of interactive learning in which a human teacher provides feedback to the policy during execution, as opposed to imitation learning where a pre-collected dataset of perfect demonstrations is used. Our proposed CEILing (Corrective and Evaluative Interactive Learning) framework combines both corrective and evaluative feedback from the teacher to train a stochastic policy in an asynchronous manner, and employs a dedicated mechanism to trade off human corrections with the robot's own experience. We present results obtained with our framework in extensive simulation and real-world experiments to demonstrate that CEILing can effectively solve complex robot manipulation tasks directly from raw images in less than one hour of real-world training.
引用
收藏
页码:3695 / 3702
页数:8
相关论文
共 40 条
[1]   Autonomous Helicopter Aerobatics through Apprenticeship Learning [J].
Abbeel, Pieter ;
Coates, Adam ;
Ng, Andrew Y. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) :1608-1639
[2]  
Akrour R, 2014, PR MACH LEARN RES, V32, P1503
[3]  
Akrour R, 2011, LECT NOTES ARTIF INT, V6911, P12, DOI 10.1007/978-3-642-23780-5_11
[4]  
[Anonymous], 2013, Policy shaping: Integrating human feedback with reinforcement learning
[5]  
Arumugam D., 2019, ABS190204257 CORR
[6]  
Biyik E., 2020, ROBOT SCI SYST
[7]  
Biyik E, 2019, PR MACH LEARN RES, V100
[8]  
Blumberg B, 2002, ACM T GRAPHIC, V21, P417, DOI 10.1145/566570.566597
[9]   Perspectives on Deep Multimodel Robot Learning [J].
Burgard, Wolfram ;
Valada, Abhinav ;
Radwan, Noha ;
Naseer, Tayyab ;
Zhang, Jingwei ;
Vertens, Johan ;
Mees, Oier ;
Eitel, Andreas ;
Oliveira, Gabriel .
ROBOTICS RESEARCH, 2020, 10 :17-24
[10]  
Cederborg T, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3366