Sampling-based Inverse Reinforcement Learning Algorithms with Safety Constraints

被引:10
作者
Fischer, Johannes [1 ]
Eyberg, Christoph [1 ]
Werling, Moritz [2 ]
Lauer, Martin [1 ]
机构
[1] Karlsruhe Inst Technol KIT, Inst Measurement & Control Syst, Karlsruhe, Germany
[2] BMWGrp, Unterschleissheim, Germany
来源
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2021年
关键词
Reinforcement Learning; Inverse Reinforcement Learning; Maximum Entropy; Constraints; Safety; SUMO;
D O I
10.1109/IROS51168.2021.9636672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Planning for robotic systems is frequently formulated as an optimization problem. Instead of manually tweaking the parameters of the cost function, they can be learned from human demonstrations by Inverse Reinforcement Learning (IRL). Common IRL approaches employ a maximum entropy trajectory distribution that can be learned with soft reinforcement learning, where the reward maximization is regularized with an entropy objective. The consideration of safety constraints is of paramount importance for human-robot collaboration. For this reason, our work addresses maximum entropy IRL in constrained environments. Our contribution to this research area is threefold: (1) We propose Constrained Soft Reinforcement Learning (CSRL), an extension of soft reinforcement learning to Constrained Markov Decision Processes (CMDPs). (2) We transfer maximum entropy IRL to CMDPs based on CSRL. (3) We show that using importance sampling in maximum entropy IRL in constrained environments introduces a bias and fails to achieve feature matching. In our evaluation we consider the tactical lane change decision of an autonomous vehicle in a highway scenario modeled in the SUMO traffic simulation.
引用
收藏
页码:791 / 798
页数:8
相关论文
共 27 条
[1]  
Aghasadeghi N., 2011, IEEE RSJ INT C INT S
[2]  
Alshiekh M., 2018, AAAI C ART INT
[3]  
[Anonymous], 2004, P 21 INT C MACHINE L
[4]  
[Anonymous], 2004, Monte Carlo statistical methods, DOI DOI 10.1007/978-1-4757-4145-2
[5]  
Boularias A., 2011, P 14 INT C ARTIFICIA, P182
[6]  
Finn Chelsea, 2016, PR MACH LEARN RES, P49, DOI DOI 10.5555/3045390.3045397
[7]  
Haarnoja T., 2017, ARXIV170208165CS
[8]  
Hoel CJ, 2018, IEEE INT C INTELL TR, P2148, DOI 10.1109/ITSC.2018.8569568
[9]  
Huegle M., 2019, IEEE RSJ INT C INT R
[10]  
Isele D, 2018, IEEE INT CONF ROBOT, P2034