Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

被引:0
作者
ZHANG RuiXian [1 ]
HAN YiNing [2 ]
SU Man [3 ]
LIN ZeFeng [1 ]
LI HaoWei [1 ]
ZHANG LiXian [1 ]
机构
[1] School of Astronautics, Harbin Institute of Technology
[2] School of Management, Harbin Institute of Technology
[3] Beijing Institute of Tracking and Telecommunication Technology
关键词
D O I
暂无
中图分类号
TP242 [机器人];
学科分类号
1111 ;
摘要
This paper addresses the issue of safety in reinforcement learning(RL) with disturbances and its application in the safetyconstrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function(r LVF) is proposed.The r LVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state. Using the r LVF,a uniformly ultimate boundedness criterion is established. This criterion is desired to ensure that the cost function, which serves as a safety criterion, ultimately converges to a range via the policy to be designed. Moreover, to mitigate the drastic variation of the r LVF caused by differences in states, a smoothing regularization of the r LVF is introduced. To train policies with safety guarantees under the worst disturbances of the observed states, an off-policy robust RL algorithm is proposed. The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole, which involve external disturbances and variations of the model parameters, respectively. The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.
引用
收藏
页码:172 / 182
页数:11
相关论文
共 14 条
[1]  
A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control.[J].DONG Xiang;ZHANG Jing;CHENG Long;XU WenJun;SU Hang;MEI Tao;.Science China(Technological Sciences).2022, 10
[2]  
Hawk and pigeon's intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization.[J].YU YuePing;LIU JiChuan;WEI Chen;.Science China(Technological Sciences).2022, 05
[3]  
Formation control of quad-rotor UAV via PIO.[J].BAI TingTing;WANG DaoBo;MASOOD Rana Javed;.Science China(Technological Sciences).2022, 02
[4]  
Convolution without multiplication: A general speed up strategy for CNNs.[J].CAI GuoRong;YANG ShengMing;DU Jing;WANG ZongYue;HUANG Bin;GUAN Yin;SU SongJian;SU JinHe;SU SongZhi;.Science China(Technological Sciences).2021, 12
[5]  
Robust control of uncertain robotic systems:An adaptive friction compensation approach.[J].WANG QiShao;ZHUANG Han;DUAN ZhiSheng;WANG QingYun;.Science China(Technological Sciences).2021, 06
[6]   Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles [J].
Zhang, Lixian ;
Zhang, Ruixian ;
Wu, Tong ;
Weng, Rui ;
Han, Minghao ;
Zhao, Ye .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) :5435-5444
[7]   Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee [J].
Han, Minghao ;
Tian, Yuan ;
Zhang, Lixian ;
Wang, Jun ;
Pan, Wei .
AUTOMATICA, 2021, 129
[8]   Deep Learning for Video Game Playing [J].
Justesen, Niels ;
Bontrager, Philip ;
Togelius, Julian ;
Risi, Sebastian .
IEEE TRANSACTIONS ON GAMES, 2020, 12 (01) :1-20
[9]  
Actor-Critic Reinforcement Learning for Control with Stability Guarantee.[J].Minghao Han;Lixian Zhang;Jun Wang;Wei Pan.IEEE Robotics and Automation Letters.2020, 99
[10]  
Wasserstein Robust Reinforcement Learning..[J].Mohammed Amin Abdullah;Hang Ren;Haitham Bou-Ammar;Vladimir Milenkovic;Rui Luo;Mingtian Zhang;Jun Wang 0012.CoRR.2019,