FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimizer

被引:5
作者
Sun, Chuangchuang [1 ]
Kim, Dong-Ki [1 ]
How, Jonathan P. [1 ]
机构
[1] MIT, Lab Informat & Decis Syst LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) | 2021年
关键词
ALGORITHMS;
D O I
10.1109/ICRA48506.2021.9561147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation to decrease monotonically, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the first DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings.
引用
收藏
页码:10617 / 10624
页数:8
相关论文
共 36 条
[1]  
Achiam J, 2017, PR MACH LEARN RES, V70
[2]  
Altman E., 1999, STOCH MODEL SER, V1st, DOI 10.1201/9781315140223
[3]   Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].
Ames, Aaron D. ;
Xu, Xiangru ;
Grizzle, Jessy W. ;
Tabuada, Paulo .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876
[4]  
Andrychowicz M, 2016, ADV NEUR IN, V29
[5]  
[Anonymous], 2017, ARXIV170808611
[6]  
Berkenkamp F, 2017, ADV NEUR IN, V30
[7]   Set invariance in control [J].
Blanchini, F .
AUTOMATICA, 1999, 35 (11) :1747-1767
[8]  
Blanchini F, 2008, SYST CONTROL-FOUND A, P1
[9]  
Chen YT, 2017, PR MACH LEARN RES, V70
[10]  
Cheng R, 2019, AAAI CONF ARTIF INTE, P3387