Verifiably Safe Exploration for End-to-End Reinforcement Learning

被引：22

作者：

Hunt, Nathan ^{[1
]}

Fulton, Nathan ^{[2
]}

Magliacane, Sara ^{[3
,4
]}

Trong Nghia Hoang ^{[2
]}

Das, Subhro ^{[2
]}

Solar-Lezama, Armando ^{[1
]}

机构：

[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[2] IBM Res, MIT IBM Watson AI Lab, Cambridge, MA USA

[3] MIT IBM Watson AI Lab, Cambridge, MA USA

[4] Univ Amsterdam, Amsterdam, Netherlands

来源：

HSCC2021: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK) | 2021年

关键词：

formal verification; reinforcement learning; neural networks; hybrid systems; safe artificial intelligence; differential dynamic logic;

D O I：

10.1145/3447928.3456653

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.

引用

页数：11

共 49 条

[1]

Achiam J, 2017, PR MACH LEARN RES, V70

[2]

Alshiekh M, 2018, AAAI CONF ARTIF INTE, P2669

[3]

[Anonymous], 2011, 26262 ISO

[4]

Berkenkamp F, 2017, ADV NEUR IN, V30

[5]

Cheng R, 2019, AAAI CONF ARTIF INTE, P3387

[6]

Clarke E.M., 2018, Handbook of Model Checking, DOI [DOI 10.1007/978-3-319-10575-8, 10.1007/978-3-319-10575-8]

[7]

Dalal G, 2018, Arxiv, DOI arXiv:1801.08757

[8]

De Giacomo Giuseppe, 2019, INT C AUTOMATED PLAN

[9]

Fulton N, 2018, AAAI CONF ARTIF INTE, P6485

[10] Verifiably Safe Off-Model Reinforcement Learning [J].

Fulton, Nathan ;

Platzer, Andre .

TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, 2019, 11427 :413-430

← 1 2 3 4 5 →