Adaptive Safe Reinforcement Learning With Full-State Constraints and Constrained Adaptation for Autonomous Vehicles

被引:30
作者
Zhang, Yuxiang [1 ,2 ]
Liang, Xiaoling [3 ]
Li, Dongyu [4 ]
Ge, Shuzhi Sam [1 ,2 ]
Gao, Bingzhao [5 ]
Chen, Hong [6 ]
Lee, Tong Heng [3 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[2] Natl Univ Singapore, Inst Funct Intelligent Mat, Singapore 117583, Singapore
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117576, Singapore
[4] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China
[5] Tongji Univ, Clean Energy Automot Engn Ctr, Shanghai 201804, Peoples R China
[6] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China
关键词
Adaptive dynamic programming (ADP); autonomous vehicles; barrier Lyapunov function (BLF); safe reinforcement learning (RL); BARRIER LYAPUNOV FUNCTIONS; NONLINEAR-SYSTEMS; TRACKING CONTROL;
D O I
10.1109/TCYB.2023.3283771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-performance learning-based control for the typical safety-critical autonomous vehicles invariably requires that the full-state variables are constrained within the safety region even during the learning process. To solve this technically critical and challenging problem, this work proposes an adaptive safe reinforcement learning (RL) algorithm that invokes innovative safety-related RL methods with the consideration of constraining the full-state variables within the safety region with adaptation. These are developed toward assuring the attainment of the specified requirements on the full-state variables with two notable aspects. First, thus, an appropriately optimized backstepping technique and the asymmetric barrier Lyapunov function (BLF) methodology are used to establish the safe learning framework to ensure system full-state constraints requirements. More specifically, each subsystem's control and partial derivative of the value function are decomposed with asymmetric BLF-related items and an independent learning part. Then, the independent learning part is updated to solve the Hamilton-Jacobi-Bellman equation through an adaptive learning implementation to attain the desired performance in system control. Second, with further Lyapunov-based analysis, it is demonstrated that safety performance is effectively doubly assured via a methodology of a constrained adaptation algorithm during optimization (which incorporates the projection operator and can deal with the conflict between safety and optimization). Therefore, this algorithm optimizes system control and ensures that the full set of state variables involved is always constrained within the safety region during the whole learning process. Comparison simulations and ablation studies are carried out on motion control problems for autonomous vehicles, which have verified superior performance with smaller variance and better convergence performance under uncertain circumstances. The effectiveness of the safe performance of overall system control with the proposed method accordingly has been verified.
引用
收藏
页码:1907 / 1920
页数:14
相关论文
共 48 条
[1]   Safe Controller Synthesis for Data-Driven Differential Inclusions [J].
Ahmadi, Mohamadreza ;
Israel, Arie ;
Topcu, Ufuk .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (11) :4934-4940
[2]  
Ames AD, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P3420, DOI [10.23919/ECC.2019.8796030, 10.23919/ecc.2019.8796030]
[3]   Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Saturation [J].
Bai, Weiwei ;
Zhou, Qi ;
Li, Tieshan ;
Li, Hongyi .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (08) :3433-3443
[4]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[5]   Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning [J].
Brunke, Lukas ;
Greeff, Melissa ;
Hall, Adam W. ;
Yuan, Zhaocong ;
Zhou, Siqi ;
Panerati, Jacopo ;
Schoellig, Angela P. .
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 :411-444
[6]   Multiphase Overtaking Maneuver Planning for Autonomous Ground Vehicles Via a Desensitized Trajectory Optimization Approach [J].
Chai, Runqi ;
Tsourdos, Antonios ;
Chai, Senchun ;
Xia, Yuanqing ;
Savvaris, Al ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (01) :74-87
[7]   Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment [J].
Chai, Runqi ;
Niu, Hanlin ;
Carrasco, Joaquin ;
Arvin, Farshad ;
Yin, Hujun ;
Lennox, Barry .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) :5778-5792
[8]   Multiobjective Overtaking Maneuver Planning for Autonomous Ground Vehicles [J].
Chai, Runqi ;
Tsourdos, Antonios ;
Al Savvaris ;
Chai, Senchun ;
Xia, Yuanqing ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (08) :4035-4049
[9]   Solving Multiobjective Constrained Trajectory Optimization Problem by an Extended Evolutionary Algorithm [J].
Chai, Runqi ;
Savvaris, Al ;
Tsourdos, Antonios ;
Xia, Yuanqing ;
Chai, Senchun .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (04) :1630-1643
[10]  
Cheng R, 2019, AAAI CONF ARTIF INTE, P3387