Autonomous Platoon Control With Integrated Deep Reinforcement Learning and Dynamic Programming

被引:22
作者
Liu, Tong [1 ]
Lei, Lei [2 ]
Zheng, Kan [3 ]
Zhang, Kuan [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Key Lab Universal Wireless Commun, Intelligent Comp & Commun Lab, Minist Educ, Beijing 100876, Peoples R China
[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada
[3] Ningbo Univ, Coll Elect Engn & Comp Sci, Ningbo 315211, Peoples R China
[4] Univ Nebraska, Dept Elect & Comp Engn, Omaha, NE 68182 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Training; Stability analysis; Aerospace electronics; Internet of Things; Vehicle dynamics; Safety; Reinforcement learning; Deep reinforcement learning (DRL); dynamic programming (DP); platoon control; ADAPTIVE CRUISE CONTROL; CACC; VEHICLES;
D O I
10.1109/JIOT.2022.3222128
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous vehicles in a platoon determine the control inputs based on the system state information collected and shared by the Internet of Things (IoT) devices. Deep reinforcement learning (DRL) is regarded as a potential method for car-following control and has been mostly studied to support a single following vehicle. However, it is more challenging to learn an efficient car-following policy with convergence stability when there are multiple following vehicles in a platoon, especially with unpredictable leading vehicle behavior. In this context, we adopt an integrated DRL and dynamic programming (DP) approach to learn autonomous platoon control policies, which embeds the deep deterministic policy gradient (DDPG) algorithm into a finite-horizon value iteration framework. Although the DP framework can improve the stability and performance of DDPG, it has the limitations of lower sampling and training efficiency. In this article, we propose an algorithm, namely, finite-horizon-DDPG with sweeping through reduced state space using stationary approximation (FH-DDPG-SS), which uses three key ideas to overcome the above limitations, i.e., transferring network weights backward in time, stationary policy approximation for earlier time steps, and sweeping through reduced state space. In order to verify the effectiveness of FH-DDPG-SS, simulation using real driving data is performed, where the performance of FH-DDPG-SS is compared with those of the benchmark algorithms. Finally, platoon safety and string stability for FH-DDPG-SS are demonstrated.
引用
收藏
页码:5476 / 5489
页数:14
相关论文
共 51 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] [Anonymous], 2016, Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles
  • [3] Boyd S. P, 2009, LECT 1 LINEAR QUADRA
  • [4] Buechel M, 2018, IEEE INT C INTELL TR, P2391, DOI 10.1109/ITSC.2018.8569977
  • [5] A Cooperative Driving Strategy Based on Velocity Prediction for Connected Vehicles With Robust Path-Following Control
    Chen, Yimin
    Lu, Chao
    Chu, Wenbo
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (05): : 3822 - 3832
  • [6] Chu TS, 2019, IEEE DECIS CONTR P, P4079
  • [7] Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach
    Desjardins, Charles
    Chaib-draa, Brahim
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2011, 12 (04) : 1248 - 1260
  • [8] A Distributed Adaptive Triple-Step Nonlinear Control for a Connected Automated Vehicle Platoon With Dynamic Uncertainty
    Guo, Hongyan
    Liu, Jun
    Dai, Qikun
    Chen, Hong
    Wang, Yulei
    Zhao, Wanzhong
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (05): : 3861 - 3871
  • [9] Heess N, 2015, ADV NEUR IN, V28
  • [10] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507