Safety robustness of reinforcement learning policies: A view from robust control

被引：13

作者：

Xiong, Hao ^{[1
,2
]}

Diao, Xiumin ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Mech Engn & Automat, Shenzhen 518055, Guangdong, Peoples R China

[2] Purdue Univ, Sch Engn Technol, W Lafayette, IN 47907 USA

来源：

NEUROCOMPUTING | 2021年 / 422卷

关键词：

Safety robustness; Reinforcement learning; Robust control; Deep deterministic policy gradient; Cable-driven parallel robot; DYNAMICS; SYSTEMS; STATE;

D O I：

10.1016/j.neucom.2020.09.055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For a reinforcement learning (RL) problem without a specified reward function, one may specify different reward functions to better guide an agent to learn. With different reward functions, the agent can learn different policies that generally have different robustness. Both the achieved reward and the success rate have been commonly used to evaluate the robustness of policies. Safety is a concern when using RL to solve problems in many safety-critical applications (e.g., robotic manipulation). However, evaluating the robustness of policies from the perspective of safety has not been discussed in the literature. The major contributions of this paper are the proposal of a novel concept of safety robustness to evaluate the robustness of policies from the perspective of safety and an algorithm to approximate the safety robustness of policies. To demonstrate how to implement the proposed algorithm, illustrative experi-ments are conducted and the safety robustness of three policies for controlling the manipulation of a cable-driven parallel robot is analyzed. Experiment results show that the proposed algorithm can approximate the safety robustness of policies using the ratio of the number of safe episodes to the number of total episodes and identify the best policy from multiple policies in terms of the safety of policies. (c) 2020 Elsevier B.V. All rights reserved.

引用

页码：12 / 21

页数：10

共 63 条

[1] Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].

Ames, Aaron D. ;

Xu, Xiangru ;

Grizzle, Jessy W. ;

Tabuada, Paulo .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876

[2]

[Anonymous], 2013, Playing atari with deep reinforcement learning

[3]

Berkenkamp F, 2017, ADV NEUR IN, V30

[4]

Berkenkamp F, 2016, IEEE DECIS CONTR P, P4661, DOI 10.1109/CDC.2016.7798979

[5] Robust control under parametric uncertainty: An overview and recent results [J].

Bhattacharyya, S. P. .

ANNUAL REVIEWS IN CONTROL, 2017, 44 :45-77

[6] Multi-objectivization and ensembles of shapings in reinforcement learning [J].

Brys, Tim ;

Harutyunyan, Anna ;

Vrancx, Peter ;

Nowe, Ann ;

Taylor, Matthew E. .

NEUROCOMPUTING, 2017, 263 :48-59

[7]

Chow Y., 2018, ADV NEURAL INFORM PR, P8092

[8] Principled reward shaping for reinforcement learning via lyapunov stability theory [J].

Dong, Yunlong ;

Tang, Xiuchuan ;

Yuan, Ye .

NEUROCOMPUTING, 2020, 393 :83-90

[9]

Doyle J. C., 1982, Proceedings of the 21st IEEE Conference on Decision & Control, P629

[10]

Duan Y, 2016, PR MACH LEARN RES, V48

← 1 2 3 4 5 6 7 →