Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

被引：0

作者：

Kweider, Leen ^{[1
]}

Abou Kassem, Maissa ^{[1
]}

Sandouk, Ubai ^{[2
]}

机构：

[1] Damascus Univ, Fac Informat Technol, Dept Artificial Intelligence, Damascus, Syria

[2] Damascus Univ, Fac Informat Technol, Dept Software Engn, Damascus, Syria

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Safety; Anomaly detection; Reinforcement learning; Artificial intelligence; Optimization; Uncertainty; Measurement uncertainty; Costs; Decision making; Training; AI safety; reinforcement learning; anomaly detection; sequence modeling; risk-averse policy; reward shaping;

D O I：

10.1109/ACCESS.2024.3486549

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The deployment of artificial intelligence (AI) in decision-making applications requires ensuring an appropriate level of safety and reliability, particularly in changing environments that contain a large number of unknown observations. To address this challenge, we propose a novel safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety. Our proposed solution Safe Reinforcement Learning with Anomalous State Sequences (AnoSeqs) consists of two stages. First, we train an agent in a non-safety-critical offline 'source' environment to collect safe state sequences. Next, we use these safe sequences to build an anomaly detection model that can detect potentially unsafe state sequences in a 'target' safety-critical environment where failures can have high costs. The estimated risk from the anomaly detection model is utilized to train a risk-averse RL policy in the target environment; this involves adjusting the reward function to penalize the agent for visiting anomalous states deemed unsafe by our anomaly model. In experiments on multiple safety-critical benchmarking environments including self-driving cars, our solution approach successfully learns safer policies and proves that sequential anomaly detection can provide an effective supervisory signal for training safety-aware RL agents.

引用

页码：157140 / 157148

页数：9

共 37 条

[1]

Altman E., 2021, STOCH MODEL SER, DOI 10.1201/9781315140223

[2]

Berkenkamp F, 2017, ADV NEUR IN, V30

[3] Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning [J].

Brunke, Lukas ;

Greeff, Melissa ;

Hall, Adam W. ;

Yuan, Zhaocong ;

Zhou, Siqi ;

Panerati, Jacopo ;

Schoellig, Angela P. .

ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 :411-444

[4]

Chow Y., 2015, P ADV NEUR INF PROC, P1

[5]

Clements W. R., 2019, arXiv

[6]

Di Castro D., 2019, arXiv

[7] Towards Pareto-optimal energy management in integrated energy systems: A multi-agent and multi-objective deep reinforcement learning approach [J].

Dou, Jiaming ;

Wang, Xiaojun ;

Liu, Zhao ;

Sun, Qingkai ;

Wang, Xihao ;

He, Jinghan .

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2024, 159

[8]

Fujimoto S., 2019, arXiv, DOI DOI 10.48550/ARXIV.1910.01708

[9]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[10]

García J, 2015, J MACH LEARN RES, V16, P1437

← 1 2 3 4 →