FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimizer

被引：5

作者：

Sun, Chuangchuang ^{[1
]}

Kim, Dong-Ki ^{[1
]}

How, Jonathan P. ^{[1
]}

机构：

[1] MIT, Lab Informat & Decis Syst LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) | 2021年

关键词：

ALGORITHMS;

D O I：

10.1109/ICRA48506.2021.9561147

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation to decrease monotonically, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the first DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings.

引用

页码：10617 / 10624

页数：8

共 36 条

[1]

Achiam J, 2017, PR MACH LEARN RES, V70

[2]

Altman E., 1999, STOCH MODEL SER, V1st, DOI 10.1201/9781315140223

[3] Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].

Ames, Aaron D. ;

Xu, Xiangru ;

Grizzle, Jessy W. ;

Tabuada, Paulo .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876

[4]

Andrychowicz M, 2016, ADV NEUR IN, V29

[5]

[Anonymous], 2017, ARXIV170808611

[6]

Berkenkamp F, 2017, ADV NEUR IN, V30

[7] Set invariance in control [J].

Blanchini, F .

AUTOMATICA, 1999, 35 (11) :1747-1767

[8]

Blanchini F, 2008, SYST CONTROL-FOUND A, P1

[9]

Chen YT, 2017, PR MACH LEARN RES, V70

[10]

Cheng R, 2019, AAAI CONF ARTIF INTE, P3387

← 1 2 3 4 →