Verifiably Safe Exploration for End-to-End Reinforcement Learning

被引:16
|
作者
Hunt, Nathan [1 ]
Fulton, Nathan [2 ]
Magliacane, Sara [3 ,4 ]
Trong Nghia Hoang [2 ]
Das, Subhro [2 ]
Solar-Lezama, Armando [1 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] IBM Res, MIT IBM Watson AI Lab, Cambridge, MA USA
[3] MIT IBM Watson AI Lab, Cambridge, MA USA
[4] Univ Amsterdam, Amsterdam, Netherlands
关键词
formal verification; reinforcement learning; neural networks; hybrid systems; safe artificial intelligence; differential dynamic logic;
D O I
10.1145/3447928.3456653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation
    Ruan, Xiaogang
    Li, Peng
    Zhu, Xiaoqing
    Yu, Hejie
    Yu, Naigong
    Computational Intelligence and Neuroscience, 2021, 2021
  • [2] End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation
    Ruan, Xiaogang
    Li, Peng
    Zhu, Xiaoqing
    Yu, Hejie
    Yu, Naigong
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [3] End-to-end Deep Reinforcement Learning for Multi-agent Collaborative Exploration
    Chen, Zichen
    Subagdja, Budhitama
    Tan, Ah-Hwee
    2019 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), 2019, : 99 - 102
  • [4] Off-policy model-based end-to-end safe reinforcement learning
    Kanso, Soha
    Jha, Mayank Shekhar
    Theilliol, Didier
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (04) : 2806 - 2831
  • [5] Sim-to-Real Deep Reinforcement Learning for Safe End-to-End Planning of Aerial Robots
    Ugurlu, Halil Ibrahim
    Xuan Huy Pham
    Kayacan, Erdal
    ROBOTICS, 2022, 11 (05)
  • [6] End-to-End Video Captioning with Multitask Reinforcement Learning
    Li, Lijun
    Gong, Boqing
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 339 - 348
  • [7] NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning
    Haj-Ali, Ameer
    Ahmed, Nesreen K.
    Willke, Ted
    Shao, Yakun Sophia
    Asanovic, Krste
    Stoica, Ion
    CGO'20: PROCEEDINGS OF THE18TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2020, : 242 - 255
  • [8] End-to-End Deep Reinforcement Learning for Exoskeleton Control
    Rose, Lowell
    Bazzocchi, Michael C. F.
    Nejat, Goldie
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 4294 - 4301
  • [9] End-to-End Reinforcement Learning for Automatic Taxonomy Induction
    Mao, Yuning
    Ren, Xiang
    Shen, Jiaming
    Gu, Xiaotao
    Han, Jiawei
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2462 - 2472
  • [10] ORACLE: End-to-End Model Based Reinforcement Learning
    Andersen, Per-Arne
    Goodwin, Morten
    Granmo, Ole-Christoffer
    ARTIFICIAL INTELLIGENCE XXXVIII, 2021, 13101 : 44 - 57