Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization

被引：120

作者：

Guan, Yang ^{[1
,2
]}

Ren, Yangang ^{[1
,2
]}

Li, Shengbo Eben ^{[1
,2
]}

Sun, Qi ^{[1
,2
]}

Luo, Laiquan ^{[1
,2
]}

Li, Keqiang ^{[1
,2
]}

机构：

[1] Tsinghua Univ, State Key Lab Automot Safety & Energy, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Ctr Intelligent Connected Vehicles & Transportat, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2020年 / 69卷 / 11期

关键词：

Optimization; Computational modeling; Acceleration; Linear programming; Real-time systems; Safety; Trajectory; Centralized coordination method; connected and automated vehicle; reinforcement learning; traffic intersection;

D O I：

10.1109/TVT.2020.3026111

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Connected vehicles will change the modes of future transportation management and organization, especially at an intersection without traffic light. Centralized coordination methods globally coordinate vehicles approaching the intersection from all sections by considering their states altogether. However, they need substantial computation resources since they own a centralized controller to optimize the trajectories for all approaching vehicles in real-time. In this paper, we propose a centralized coordination scheme of automated vehicles at an intersection without traffic signals using reinforcement learning (RL) to address low computation efficiency suffered by current centralized coordination methods. We first propose an RL training algorithm, model accelerated proximal policy optimization (MA-PPO), which incorporates a prior model into proximal policy optimization (PPO) algorithm to accelerate the learning process in terms of sample efficiency. Then we present the design of state, action and reward to formulate centralized coordination as an RL problem. Finally, we train a coordinate policy in a simulation setting and compare computing time and traffic efficiency with a coordination scheme based on model predictive control (MPC) method. Results show that our method spends only 1/400 of the computing time of MPC and increase the efficiency of the intersection by 4.5 times.

引用

页码：12597 / 12608

页数：12

共 29 条

[1] Quality-of-Experience-Oriented Autonomous Intersection Control in Vehicular Networks [J].