Safe Reinforcement Learning for Model-Reference Trajectory Tracking of Uncertain Autonomous Vehicles With Model-Based Acceleration

被引：23

作者：

Hu, Yifan ^{[1
]}

Fu, Junjie ^{[1
,2
]}

Wen, Guanghui ^{[1
]}

机构：

[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[2] Purple Mt Labs, Nanjing 211111, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2023年 / 8卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Safety; Predictive models; Trajectory tracking; Training; Reinforcement learning; Heuristic algorithms; Uncertainty; Model-reference control; autonomous vehicle; safe reinforcement learning; model-based reinforcement learning; Gaussian process; control barrier function;

D O I：

10.1109/TIV.2022.3233592

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applying reinforcement learning (RL) algorithms to control systems design remains a challenging task due to the potential unsafe exploration and the low sample efficiency. In this paper, we propose a novel safe model-based RL algorithm to solve the collision-free model-reference trajectory tracking problem of uncertain autonomous vehicles (AVs). Firstly, a new type of robust control barrier function (CBF) condition for collision-avoidance is derived for the uncertain AVs by incorporating the estimation of the system uncertainty with Gaussian process (GP) regression. Then, a robust CBF-based RL control structure is proposed, where the nominal control input is composed of the RL policy and a model-based reference control policy. The actual control input obtained from the quadratic programming problem can satisfy the constraints of collision-avoidance, input saturation and velocity boundedness simultaneously with a relatively high probability. Finally, within this control structure, a Dyna-style safe model-based RL algorithm is proposed, where the safe exploration is achieved through executing the robust CBF-based actions and the sample efficiency is improved by leveraging the GP models. The superior learning performance of the proposed RL control structure is demonstrated through simulation experiments.

引用

页码：2332 / 2344

页数：13

共 50 条

[21] A New Trajectory Tracking Algorithm for Autonomous Vehicles Based on Model Predictive Control
Huang, Zhejun
Li, Huiyun
Li, Wenfei
Liu, Jia
Huang, Chao
Yang, Zhiheng
Fang, Wenqi
SENSORS, 2021, 21 (21)
[22] Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning
Candela, Eduardo
Doustaly, Olivier
Parada, Leandro
Feng, Felix
Demiris, Yiannis
Angeloudis, Panagiotis
ARTIFICIAL INTELLIGENCE, 2023, 320
[23] Model gradient: unified model and policy learning in model-based reinforcement learning
Chengxing Jia
Fuxiang Zhang
Tian Xu
Jing-Cheng Pang
Zongzhang Zhang
Yang Yu
Frontiers of Computer Science, 2024, 18
[24] Model gradient: unified model and policy learning in model-based reinforcement learning
Jia, Chengxing
Zhang, Fuxiang
Xu, Tian
Pang, Jing-Cheng
Zhang, Zongzhang
Yu, Yang
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
[25] A Data-Driven Model-Reference Adaptive Control Approach Based on Reinforcement Learning
Abouheaf, Mohammed
Gueaieb, Wail
Spinello, Davide
Al-Sharhan, Salah
2021 IEEE INTERNATIONAL SYMPOSIUM ON ROBOTIC AND SENSORS ENVIRONMENTS (ROSE 2021), 2021,
[26] Model-Based Reinforcement Learning with Hierarchical Control for Dynamic Uncertain Environments
Oesterdiekhoff, Annika
Heinrich, Nils Wendel
Russwinkel, Nele
Kopp, Stefan
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, INTELLISYS 2024, 2024, 1066 : 626 - 642
[27] Data-efficient model-based reinforcement learning with trajectory discrimination
Qu, Tuo
Duan, Fuqing
Zhang, Junge
Zhao, Bo
Huang, Wenzhen
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 1927 - 1936
[28] Safe Model-Based Off-Policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles
Zhu, Zhaoxuan
Pivaro, Nicola
Gupta, Shobhit
Gupta, Abhishek
Canova, Marcello
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2022, 7 (02): : 387 - 398
[29] Model-based Safe Reinforcement Learning using Variable Horizon Rollouts
Gupta, Shourya
Suryaman, Utkarsh
Narava, Rahul
Jha, Shashi Shekhar
PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 100 - 108
[30] Multiphase Autonomous Docking via Model-Based and Hierarchical Reinforcement Learning
Aborizk, Anthony
Fitz-Coy, Norman
JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (04) : 993 - 1005

← 1 2 3 4 5 →