Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes

被引：9

作者：

Kochdumper, Niklas ^{[1
,2
]}

Krasowski, Hanna ^{[1
]}

Wang, Xiao ^{[1
]}

Bak, Stanley ^{[2
]}

Althoff, Matthias ^{[1
]}

机构：

[1] Tech Univ Munich, Dept Comp Engn, D-85748 Garching, Germany

[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA

来源：

IEEE OPEN JOURNAL OF CONTROL SYSTEMS | 2023年 / 2卷

基金：

欧洲研究理事会;

关键词：

Safety; Reinforcement learning; Reachability analysis; Optimization; Generators; Training; Measurement errors; Action projection; reach-avoid problems; reachability analysis; reinforcement learning; CONVEX-HULL;

D O I：

10.1109/OJCSYS.2023.3256305

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.

引用

页码：79 / 92

页数：14

共 50 条

[1] Reachability Analysis Using Constrained Polynomial Logical Zonotopes
Hafez, Ahmad
Jiang, Frank J.
Johansson, Karl H.
Alanwar, Amr
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2277 - 2282
[2] Reachability Analysis for Linear Systems with Uncertain Parameters using Polynomial Zonotopes
Huang, Yushen
Luo, Ertai
Bak, Stanley
Sun, Yifan
arXiv,
[3] Reachability analysis for linear systems with uncertain parameters using polynomial zonotopes
Huang, Yushen
Luo, Ertai
Bak, Stanley
Sun, Yifan
NONLINEAR ANALYSIS-HYBRID SYSTEMS, 2025, 56
[4] Safe Reinforcement Learning Using Black-Box Reachability Analysis
Selim, Mahmoud
Alanwar, Amr
Kousik, Shreyas
Gao, Grace
Pavone, Marco
Johansson, Karl H.
arXiv, 2022,
[5] Safe Reinforcement Learning Using Black-Box Reachability Analysis
Selim, Mahmoud
Alanwar, Amr
Kousik, Shreyas
Gao, Grace
Pavone, Marco
Johansson, Karl H.
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10665 - 10672
[6] Sparse Polynomial Zonotopes: A Novel Set Representation for Reachability Analysis
Kochdumper, Niklas
Althoff, Matthias
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (09) : 4043 - 4058
[7] Iterative Reachability Estimation for Safe Reinforcement Learning
Ganai, Milan
Gong, Zheng
Yu, Chenning
Herbert, Sylvia
Gao, Sicun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?
Gros, Sebastien
Zanon, Mario
Bemporad, Alberto
IFAC PAPERSONLINE, 2020, 53 (02): : 8076 - 8081
[9] Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned Models
Wang, Yuning
Zhu, He
COMPUTER AIDED VERIFICATION, PT III, CAV 2024, 2024, 14683 : 232 - 255
[10] Reducing Safety Interventions in Provably Safe Reinforcement Learning
Thumm, Jakob
Pelat, Guillaume
Althoff, Matthias
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7515 - 7522

← 1 2 3 4 5 →