Skill Reward for Safe Deep Reinforcement Learning

被引：0

作者：

Cheng, Jiangchang ^{[1
]}

Yu, Fumin ^{[1
]}

Zhang, Hongliang ^{[1
]}

Dai, Yinglong ^{[2
,3
]}

机构：

[1] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China

[2] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha 410073, Peoples R China

[3] Hunan Prov Key Lab Intelligent Comp & Language In, Changsha 410081, Peoples R China

来源：

UBIQUITOUS SECURITY | 2022年 / 1557卷

关键词：

Reinforcement learning; Deep reinforcement learning; Reward shaping; Skill reward; Safe agent; LEVEL;

D O I：

10.1007/978-981-19-0468-4_15

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning technology enables an agent to interact with the environment and learn from experience to maximize the cumulative reward of specific tasks, and get a powerful agent to solve decision optimization problems. This process is highly similar to our human learning process, that is, learning from the interaction with the environment. As we know, the behavior of an agent based on deep reinforcement learning is often unpredictable, and the agent will produce some weird and unsafe behavior sometimes. To make the behavior and the decision process of the agent explainable and controllable, this paper proposed the skill reward method that the agent can be constrained to learn some controllable and safe behaviors. When an agent finishes specific skills in the process of interaction with the environment, we can design the rewards obtained by the agent during the exploration process based on prior knowledge to make the learning process converge quickly. The skill reward can be embedded into the existing reinforcement learning algorithms. In this work, we embed the skill reward into the asynchronous advantage actor-critic (A3C) algorithm, and test the method in an Atari 2600 environment (Breakout-v4). The experiments demonstrate the effectiveness of the skill reward embedding method.

引用

页码：203 / 213

页数：11

共 26 条

[1] Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks
Aotani, Takumi
Kobayashi, Taisuke
Sugimoto, Kenji
[J]. APPLIED INTELLIGENCE, 2021, 51 (07) : 4434 - 4452
[2] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[3] A closed-loop healthcare processing approach based on deep reinforcement learning
Dai, Yinglong
Wang, Guojun
Muhammad, Khan
Liu, Shuai
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3107 - 3129
[4] Dayan P., 1992, ADV NEURAL INFORM PR, V5
[5] Principled reward shaping for reinforcement learning via lyapunov stability theory
Dong, Yunlong
Tang, Xiuchuan
Yuan, Ye
[J]. NEUROCOMPUTING, 2020, 393 : 83 - 90
[6] Farazi N.P., 2021, TRANSP RES INTERDISC, V11
[7] Fujimoto S, 2018, PR MACH LEARN RES, V80
[8] Haarnoja T, 2018, PR MACH LEARN RES, V80
[9] Harutyunyan A., 2015, ARXIV PREPRINT ARXIV
[10] Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey
Haydari, Ammar
Yilmaz, Yasin
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (01) : 11 - 32

← 1 2 3 →