Multitimescale Control and Communications With Deep Reinforcement Learning-Part I: Communication-Aware Vehicle Control

被引:3
作者
Liu, Tong [1 ]
Lei, Lei [2 ]
Zheng, Kan [3 ]
Shen, Xuemin [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Intelligent Comp & Commun Lab, Key Lab Universal Wireless Commun, Minist Educ, Beijing 100876, Peoples R China
[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada
[3] Ningbo Univ, Coll Elect Engn & Comp Sci, Ningbo 315211, Peoples R China
[4] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Deep reinforcement learning (DRL); multitimescale decision making; platoon control (PC); ADAPTIVE CRUISE CONTROL; RESOURCE-ALLOCATION; VEHICULAR PLATOON; DISTRIBUTED CONTROL; NETWORKS; DESIGN; CHALLENGES; ACCESS;
D O I
10.1109/JIOT.2023.3348590
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An intelligent decision-making system enabled by vehicle-to-everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multitimescale control and communications (MTCCs) MTCCs based on deep reinforcement learning (DRL). In this article (Part I), we first decompose the problem into a communication-aware DRL-based PC subproblem and a control-aware DRL-based RRA subproblem. Then, we focus on the PC subproblem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov decision process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of cellular vehicle-to-everything communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.
引用
收藏
页码:15386 / 15401
页数:16
相关论文
共 70 条
[1]  
Abanto-Leon LF, 2018, VEH TECHNOL CONFE
[2]  
Altman E., 1992, Performance Evaluation Review, V20, P193, DOI 10.1145/149439.133106
[3]  
[Anonymous], 2017, Rep. TS 22.886.
[4]  
[Anonymous], 2016, 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 36.885
[5]  
Ayan O, 2022, Arxiv, DOI arXiv:2202.09189
[6]  
Bouteiller Y., 2020, P INT C LEARN REPR, P1
[7]  
Buechel M, 2018, IEEE INT C INTELL TR, P2391, DOI 10.1109/ITSC.2018.8569977
[8]  
Chen MS, 2023, Arxiv, DOI arXiv:2306.01243
[9]  
Chu TS, 2019, IEEE DECIS CONTR P, P4079
[10]   A Review of Communication, Driver Characteristics, and Controls Aspects of Cooperative Adaptive Cruise Control (CACC) [J].
Dey, Kakan C. ;
Yan, Li ;
Wang, Xujie ;
Wang, Yue ;
Shen, Haiying ;
Chowdhury, Mashrur ;
Yu, Lei ;
Qiu, Chenxi ;
Soundararaj, Vivekgautham .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2016, 17 (02) :491-509