AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

被引:0
作者
Lu, Yao [1 ]
Hausman, Karol [1 ]
Chebotar, Yevgen [1 ]
Yan, Mengyuan [2 ]
Jang, Eric [1 ]
Herzog, Alexander [2 ]
Xiao, Ted [1 ]
Irpan, Alex [1 ]
Khansari, Mohi [2 ]
Kalashnikov, Dmitry [1 ]
Levine, Sergey [1 ,3 ]
机构
[1] Google, Robot, Mountain View, CA 94043 USA
[2] Moonshot Factory, X, Mountain View, CA USA
[3] Univ Calif Berkeley, Berkeley, CA USA
来源
CONFERENCE ON ROBOT LEARNING, VOL 164 | 2021年 / 164卷
关键词
Imitation Learning; Reinforcement Learning; Robot Learning; MOTOR-SKILLS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge. In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that can utilize demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems. Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unified approach for integrating demonstrations and offline data for robotic manipulation. Please see https://awopt.github.io for more details.
引用
收藏
页码:1078 / 1088
页数:11
相关论文
共 42 条
[1]   A survey of robot learning from demonstration [J].
Argall, Brenna D. ;
Chernova, Sonia ;
Veloso, Manuela ;
Browning, Brett .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483
[2]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165, DOI 10.48550/ARXIV.2005.14165]
[3]  
Berner C., 2019, arXiv
[4]  
Bin Peng X, 2019, Arxiv, DOI arXiv:1910.00177
[5]   On learning, representing, and generalizing a task in a humanoid robot [J].
Calinon, Sylvain ;
Guenter, Florent ;
Billard, Aude .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02) :286-298
[6]  
Cheng C. -A., 2020, PROC C ROBOT LEARN, P1379
[7]   Survey of imitation learning for robotic manipulation [J].
Fang, Bin ;
Jia, Shidong ;
Guo, Di ;
Xu, Muhua ;
Wen, Shuhuan ;
Sun, Fuchun .
INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2019, 3 (04) :362-369
[8]  
Fujimoto S, 2019, PR MACH LEARN RES, V97
[9]  
Haarnoja T, 2019, Arxiv, DOI arXiv:1812.05905
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778