Safety Verification of Non-Deterministic Policies in Reinforcement Learning

被引：0

作者：

Kwon, Ryeonggu ^{[1
]}

Kwon, Gihwon ^{[1
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Suwon 16227, Gyeonggi Do, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Model-free reinforcement learning; safety verification; probabilistic model checking; storm model checker;

D O I：

10.1109/ACCESS.2024.3513459

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement Learning represents a powerful paradigm in artificial intelligence, enabling agents to learn optimal behaviors through interactions with their environment. However, ensuring the safety of policies learned in non-deterministic environments, where outcomes are inherently uncertain and variable, remains a critical challenge. Safety in this context refers to the assurance that an agent's actions will not lead to undesirable or hazardous states, which is paramount in high-risk domains like autonomous driving and healthcare. This study proposes a novel methodology to rigorously verify the safety properties of non-deterministic policies in RL. We transform non-deterministic policies into Markov Decision Process models and employ the state-of-the-art probabilistic model checker STORM for formal verification. Our approach achieves a maximum probability of 1.0 for reaching the goal state in non-slippery environments, while in slippery environments, The probability declines to approximately 0.041 on the 4 x 4 map and 0.0023 on the 8 x 8 map. Additionally, we observed a significant reduction in the probability of not reaching the goal state in non-slippery conditions, while slippery conditions increased the likelihood of failure, with probabilities as high as 0.934 in complex scenarios. Empirical validation through extensive experiments in the FrozenLake environment highlights the challenges and effectiveness of our approach under varying conditions. Our contributions are fourfold: First, we propose a formal verification method for RL policies in non-deterministic environments. Second, we provide a comprehensive framework for rigorously modeling and analyzing non-deterministic policy safety properties by integrating MDP models with the STORM model checker. Third, We use model-free RL to generate an MDP model from the trained policy. This allows us to evaluate goal achievement and identify potential unsafe states caused by non-deterministic environments. Fourth, our extensive empirical validation offers concrete insights into non-deterministic policy verification challenges and solutions. In conclusion, this study addresses the pivotal issue of ensuring the safety of non-deterministic policies in RL. Through formal verification techniques and comprehensive empirical validation, we provide a robust methodology that enhances the trustworthiness and applicability of non-deterministic policies in real-world applications.

引用

页码：185768 / 185788

页数：21

共 29 条

[1] Hybrid Intelligent Control System for Adaptive Microgrid Optimization: Integration of Rule-Based Control and Deep Learning Techniques [J].

Akbulut, Osman ;

Cavus, Muhammed ;

Cengiz, Mehmet ;

Allahham, Adib ;

Giaouris, Damian ;

Forshaw, Matthew .

ENERGIES, 2024, 17 (10)

[2]

Amir Guy, 2021, 2021 Formal Methods in Computer Aided Design (FMCAD), P193, DOI 10.34727/2021/isbn.978-3-85448-046-4_28

[3]

Bacci E, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P2154

[4] A hybrid method based on logic predictive controller for flexible hybrid microgrid with plug-and-play capabilities [J].

Cavus, Muhammed ;

Allahham, Adib ;

Adhikari, Kabita ;

Giaouris, Damian .

APPLIED ENERGY, 2024, 359

[5] Context-Aware Safe Reinforcement Learning for Non-Stationary Environments [J].

Chen, Baiming ;

Liu, Zuxin ;

Zhu, Jiacheng ;

Xu, Mengdi ;

Ding, Wenhao ;

Li, Liang ;

Zhao, Ding .

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :10689-10695

[6] Safe and Sample-Efficient Reinforcement Learning for Clustered Dynamic Environments [J].

Chen, Hongyi ;

Liu, Changliu .

IEEE CONTROL SYSTEMS LETTERS, 2022, 6 :1928-1933

[7] Relational Verification using Reinforcement Learning [J].

Chen, Jia ;

Wei, Jiayi ;

Feng, Yu ;

Bastani, Osbert ;

Dillig, Isil .

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA)

[8] Automated Safety Verification of Programs Invoking Neural Networks [J].

Christakis, Maria ;

Eniser, Hasan Ferit ;

Hermanns, Holger ;

Hoffmann, Joerg ;

Kothari, Yugesh ;

Li, Jianlin ;

Navas, Jorge A. ;

Wuestholz, Valentin .

COMPUTER AIDED VERIFICATION (CAV 2021), PT I, 2021, 12759 :201-224

[9] Formal Verification for Safe Deep Reinforcement Learning in Trajectory Generation [J].

Corsi, Davide ;

Marchesini, Enrico ;

Farinelli, Alessandro ;

Fiorini, Paolo .

2020 FOURTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2020), 2020, :352-359

[10]

El Mqirmi P., 2021, AAMAS 21, P474

← 1 2 3 →