Cooperative Control for Multi-Intersection Traffic Signal Based on Deep Reinforcement Learning and Imitation Learning

被引：18

作者：

Huo, Yusen ^{[1
]}

Tao, Qinghua ^{[1
]}

Hu, Jianming ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金;

关键词：

Training; Numerical models; Reinforcement learning; Feature extraction; Convergence; Optimization; Indexes; Deep reinforcement learning; imitation learning; multi-intersection; proximal policy optimization; tensor; TRANSPORT;

D O I：

10.1109/ACCESS.2020.3034419

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traffic signal control has long been considered as a critical topic in intelligent transportation systems. Most existing related methods either suffer from inefficient training or mainly focus on isolated intersections. This article aims at the cooperative control for multi-intersection traffic signal, in which a novel end-to-end learning model is established and an efficient training method is proposed analogously, which is capable of adapting to large-scale scenarios. In the proposed method, the input traffic status in multi-intersection are expressed by a tensor without information loss, which significantly reduces model complexity than using a huge matrix, since additional convolutional layers can be required to extract features from a huge matrix. For the output, a multidimensional boolean vector is employed to simplify the control policy with abiding the practical phase changing rules, and then a multi-task learning structure is used to get the cooperative policy. Instead of only using the reinforcement learning to train the model, we employ imitation learning to integrate a rule based model to do the pre-training, which greatly accelerates the convergence. Afterwards, the reinforcement learning method is adopted to continue the fine training, where proximal policy optimization algorithm is incorporated to solve the policy collapse problem in multi-dimensional output situation. Numerical experiments demonstrate the distinctive advantages of the proposed method with comparison to the efficiency and accuracy of the related state-of-the-art methods.

引用

页码：199573 / 199585

页数：13

共 42 条

[1] Achiam J, 2017, PR MACH LEARN RES, V70
[2] [Anonymous], 2018, OPENAI FIVE
[3] [Anonymous], 2010, THESIS CARNEGIE MELL
[4] BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
[5] THE THEORY OF DYNAMIC PROGRAMMING
BELLMAN, R
[J]. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) : 503 - 515
[6] Bitam S., 2012, GLOBECOM 2012 - 2012 IEEE Global Communications Conference, P2054, DOI 10.1109/GLOCOM.2012.6503418
[7] Caruana R., 1993, P 10 INT C MACH LEAR, DOI [DOI 10.1016/B978-1-55860-307-3.50012-5, 10.1016/b978-1-55860-307-3.50012-5]
[8] Casas Noe, 2017, arXiv:1703.09035
[9] Chen CC, 2020, AAAI CONF ARTIF INTE, V34, P3414
[10] Convergence of V2X communication systems and next generation networks
Costandoiu, A.
Leba, M.
[J]. INTERNATIONAL CONFERENCE ON APPLIED SCIENCES, 2019, 477

← 1 2 3 4 5 →