Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control

被引：656

作者：

Chu, Tianshu ^{[1
]}

Wang, Jie ^{[1
]}

Codeca, Lara ^{[2
]}

Li, Zhaojian ^{[3
]}

机构：

[1] Stanford Univ, Dept Civil & Environm Engn, Stanford, CA 94305 USA

[2] EURECOM, Commun Syst Dept, F-06904 Sophia Antipolis, France

[3] Michigan State Univ, Dept Mech Engn, E Lansing, MI 48824 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2020年 / 21卷 / 03期

关键词：

Reinforcement learning; Scalability; Heuristic algorithms; Mathematical model; Codecs; Neural networks; Convergence; Adaptive traffic signal control; reinforcement learning; multi-agent reinforcement learning; deep reinforcement learning; actor-critic; ALGORITHMS; NETWORK;

D O I：

10.1109/TITS.2019.2901791

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, the centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. The multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now, the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent, advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. The results demonstrate its optimality, robustness, and sample efficiency over the other state-of-the-art decentralized MARL algorithms.

引用

页码：1086 / 1095

页数：10

共 45 条

[1]

[Anonymous], P SUMO US C SIM AUT

[2]

[Anonymous], THESIS

[3]

[Anonymous], DOTRSPADPB508124

[4]

[Anonymous], 2016, SAMPLE EFFICIENT ACT

[5]

[Anonymous], 2013, EXACT SOLUTIONS NONL

[6] Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events [J].

Aslani, Mohammad ;

Mesgari, Mohammad Saadi ;

Wiering, Marco .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2017, 85 :732-752

[7] Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility [J].

Aziz, H. M. Abdul ;

Zhu, Feng ;

Ukkusuri, Satish V. .

JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 22 (01) :40-52

[8] A MARKOVIAN DECISION PROCESS [J].

BELLMAN, R .

JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684

[9] Adaptive traffic signal control using approximate dynamic programming [J].

Cai, Chen ;

Wong, Chi Kwong ;

Heydecker, Benjamin G. .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2009, 17 (05) :456-474

[10]

Casas N., 2017, Deep Deterministic Policy Gradient for Urban Traffic Light Control

← 1 2 3 4 5 →