Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning

被引:14
作者
Liu, Junxiu [1 ]
Qin, Sheng [1 ]
Su, Min [1 ]
Luo, Yuling [1 ]
Wang, Yanhu [1 ]
Yang, Su [2 ]
机构
[1] Guangxi Normal Univ, Sch Elect & Informat Engn, Guangxi Key Lab Brain Inspired Comp & Intelligent, Guilin, Peoples R China
[2] Swansea Univ, Dept Comp Sci, Swansea, Wales
基金
中国国家自然科学基金;
关键词
Traffic signal control; Reinforcement learning; Multi-agent system; ALGORITHM; LIGHTS;
D O I
10.1016/j.ins.2023.119484
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the multi-agent traffic signal controls, the traffic signal at each intersection is controlled by an independent agent. Since the control policy for each agent is dynamic, when the traffic scale is large, the adjustment of the agent's policy brings non-stationary effects over surrounding intersections, leading to the instability of the overall system. Therefore, there is the necessity to eliminate this non-stationarity effect to stabilize the multi-agent system. A collaborative multi agent reinforcement learning method is proposed in this work to enable the system to overcome the instability problem through a collaborative mechanism. Decentralized learning with limited communication is used to reduce the communication latency between agents. The Shapley value reward function is applied to comprehensively calculate the contribution of each agent to avoid the influence of reward function coefficient variation, thereby reducing unstable factors. The Kullback-Leibler divergence is then used to distinguish the current and historical policies, and the loss function is optimized to eliminate the environmental non-stationarity. Experimental results demonstrate that the average travel time and its standard deviation are reduced by using the Shapley value reward function and optimized loss function, respectively, and this work provides an alternative for traffic signal controls on multiple intersections.
引用
收藏
页数:12
相关论文
共 46 条
[11]   Traffic signal timing via deep reinforcement learning [J].
Li L. ;
Lv Y. ;
Wang F.-Y. .
IEEE/CAA Journal of Automatica Sinica, 2016, 3 (03) :247-254
[12]  
Lillicrap TP, 2015, arXiv
[13]   EEG-Based Emotion Classification Using a Deep Neural Network and Sparse Autoencoder [J].
Liu, Junxiu ;
Wu, Guopei ;
Luo, Yuling ;
Qiu, Senhui ;
Yang, Su ;
Li, Wei ;
Bi, Yifei .
FRONTIERS IN SYSTEMS NEUROSCIENCE, 2020, 14
[14]   Echo state network optimization using binary grey wolf algorithm [J].
Liu, Junxiu ;
Sun, Tiening ;
Luo, Yuling ;
Yang, Su ;
Cao, Yi ;
Zhai, Jia .
NEUROCOMPUTING, 2020, 385 :310-318
[15]   An echo state network architecture based on quantum logic gate and its optimization [J].
Liu, Junxiu ;
Sun, Tiening ;
Luo, Yuling ;
Yang, Su ;
Cao, Yi ;
Zhai, Jia .
NEUROCOMPUTING, 2020, 371 :100-107
[16]   Human Body Posture Recognition Using Wearable Devices [J].
Liu, Junxiu ;
Li, Mingxing ;
Luo, Yuling ;
Yang, Su ;
Qiu, Senhui .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 :326-337
[17]   Mass Spectral Substance Detections Using Long Short-Term Memory Networks [J].
Liu, Junxiu ;
Zhang, Jinlei ;
Luo, Yuling ;
Yang, Su ;
Wang, Jinling ;
Fu, Qiang .
IEEE ACCESS, 2019, 7 :10734-10744
[18]  
Liu RS, 2018, ANN ALLERTON CONF, P478, DOI 10.1109/ALLERTON.2018.8636075
[19]  
Liu XY, 2018, Arxiv, DOI arXiv:1812.00979
[20]  
MILLER AJ, 1963, OPER RES QUART, V14, P373, DOI 10.2307/3006800