Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes

被引:9
|
作者
Kochdumper, Niklas [1 ,2 ]
Krasowski, Hanna [1 ]
Wang, Xiao [1 ]
Bak, Stanley [2 ]
Althoff, Matthias [1 ]
机构
[1] Tech Univ Munich, Dept Comp Engn, D-85748 Garching, Germany
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
来源
基金
欧洲研究理事会;
关键词
Safety; Reinforcement learning; Reachability analysis; Optimization; Generators; Training; Measurement errors; Action projection; reach-avoid problems; reachability analysis; reinforcement learning; CONVEX-HULL;
D O I
10.1109/OJCSYS.2023.3256305
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
引用
收藏
页码:79 / 92
页数:14
相关论文
共 50 条
  • [11] Safe Reinforcement Learning by Shielding based Reachable Zonotopes for Autonomous Vehicles
    Raeesi, H.
    Khosravi, A.
    Sarhadi, P.
    International Journal of Engineering, Transactions A: Basics, 2025, 38 (01): : 21 - 34
  • [12] Safe Reinforcement Learning by Shielding based Reachable Zonotopes for Autonomous Vehicles
    Raeesi, H.
    Khosravi, A.
    Sarhadi, P.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2025, 38 (01): : 21 - 34
  • [13] Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability
    Zhu, Kai
    Lan, Fengbo
    Zhao, Wenbo
    Zhang, Tao
    Journal of Intelligent and Robotic Systems: Theory and Applications, 111 (01):
  • [14] Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability
    Kai Zhu
    Fengbo Lan
    Wenbo Zhao
    Tao Zhang
    Journal of Intelligent & Robotic Systems, 111 (1)
  • [15] On Normative Reinforcement Learning via Safe Reinforcement Learning
    Neufeld, Emery A.
    Bartocci, Ezio
    Ciabattoni, Agata
    PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 72 - 89
  • [16] Reachability Analysis and Safety Verification of Neural Feedback Systems via Hybrid Zonotopes
    Zhang, Yuhao
    Xu, Xiangru
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1915 - 1921
  • [17] Provably Efficient Reinforcement Learning via Surprise Bound
    Zhu, Hanlin
    Wang, Ruosong
    Lee, Jason D.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [18] Backward Reachability Analysis of Neural Feedback Systems Using Hybrid Zonotopes
    Zhang, Yuhao
    Zhang, Hang
    Xu, Xiangru
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2779 - 2784
  • [19] NMPC Strategy for Safe Robot Navigation in Unknown Environments using Polynomial Zonotopes
    Nascimento, Iuro B. P.
    Rego, Brenner S.
    Pimenta, Luciano C. A.
    Raffo, Guilherme, V
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7100 - 7105
  • [20] Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments
    Thumm, Jakob
    Althoff, Matthias
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6344 - 6350