Validation of Reinforcement Learning Agents and Safety Shields with ProB

被引：2

作者：

Vu, Fabian ^{[1
]}

Dunkelau, Jannik ^{[1
]}

Leuschel, Michael ^{[1
]}

机构：

[1] Heinrich Heine Univ Dusseldorf, Math Nat Wissensch Fak, Inst Informat, Dusseldorf, Germany

来源：

NASA FORMAL METHODS, NFM 2024 | 2024年 / 14627卷

基金：

奥地利科学基金会;

关键词：

A; Reinforcement Learning; B Method; Validation; Shielding;

D O I：

10.1007/978-3-031-60698-4_16

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) is an important machine learning technique to train agents that make decisions autonomously. For safety-critical applications, however, the decision-making of an RL agent may not be intelligible to humans and thus difficult to validate, verify and certify. This work presents a technique to link a concrete RL agent with a high-level formal B model of the safety shield and the environment. This allows us to run the RL agent in the formal method tool ProB, and particularly use the formal model to surround the agent with a safety shield at runtime. This paper also presents a methodology to validate the behavior of RL agents and respective safety shields with formal methods techniques, including trace replay, simulation, and statistical validation. The validation process is supported by domain-specific visualizations to ease human validation. Finally, we demonstrate the approach for a highway simulation.

引用

页码：279 / 297

页数：19

共 42 条

[1]

Abrial J.-R., 1996, The B-Book: Assigning Programs to Meanings, Vxxxiv, DOI DOI 10.1017/CBO9780511624162

[2]

Abrial Jean-Raymond, 2010, Modeling in Event-B-System and Software Engineering, DOI DOI 10.1017/CBO9781139195881

[3] Probabilistic black-box reachability checking (extended version) [J].

Aichernig, Bernhard K. ;

Tappler, Martin .

FORMAL METHODS IN SYSTEM DESIGN, 2019, 54 (03) :416-448

[4]

Alshiekh M, 2018, AAAI CONF ARTIF INTE, P2669

[5] PRoB2-UI: A Java']Java-Based User Interface for ProB [J].

Bendisposto, Jens ;

Gelessus, David ;

Jansing, Yumiko ;

Leuschel, Michael ;

Puetz, Antonia ;

Vu, Fabian ;

Werth, Michelle .

FORMAL METHODS FOR INDUSTRIAL CRITICAL SYSTEMS, FMICS 2021, 2021, 12863 :193-201

[6]

Fulton N, 2018, AAAI CONF ARTIF INTE, P6485

[7] AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation [J].

Gehr, Timon ;

Mirman, Matthew ;

Drachsler-Cohen, Dana ;

Tsankov, Petar ;

Chaudhuri, Swarat ;

Vechev, Martin .

2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, :3-18

[8] Safety Verification of Cyber-Physical Systems with Reinforcement Learning Control [J].

Hoang-Dung Tran ;

Cai, Feiyang ;

Diego, Manzanas Lopez ;

Musau, Patrick ;

Johnson, Taylor T. ;

Koutsoukos, Xenofon .

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)

[9]

Huang Xiaowei, 2022, Formal Methods and Software Engineering: 23rd International Conference on Formal Engineering Methods, ICFEM 2022, Proceedings. Lecture Notes in Computer Science (13478), P1, DOI 10.1007/978-3-031-17244-1_1

[10] Safety Verification of Deep Neural Networks [J].

Huang, Xiaowei ;

Kwiatkowska, Marta ;

Wang, Sen ;

Wu, Min .

COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 :3-29

← 1 2 3 4 5 →