Traffic signal control using a cooperative EWMA-based multi-agent reinforcement learning

被引:8
作者
Qiao, Zhimin [1 ]
Ke, Liangjun [1 ]
Wang, Xiaoqiang [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Automat Sci & Engn, Xian 710049, Peoples R China
基金
中国国家自然科学基金;
关键词
Mean-field; Traffic signal control; TD3; Multi-agent reinforcement learning; NETWORK; ALGORITHM; COORDINATION;
D O I
10.1007/s10489-022-03643-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In contemporary urban, traffic signal control is still enormously difficult. Multi-agent reinforcement learning (MARL) is a promising ways to solve this problem. However, most MARL algorithms can not effectively transfer learning strategies when the agents increase or decrease. This paper proposes a new MARL algorithm called cooperative dynamic delay updating twin delayed deep deterministic policy gradient based on the exponentially weighted moving average (CoTD3-EWMA) to solve the problem. By introducing mean-field theory, the algorithm implicitly models the interaction between agents and environment. It reduces the dimension of action space and improves the scalability of the algorithm. In addition, we propose a dynamic delay updating method based on the exponentially weighted moving average (EWMA), which improves the Q value overestimation problem of the traditional TD3 algorithm. Moreover, a joint reward allocation mechanism and state sharing mechanism are proposed to improve the global strategy learning ability and robustness of the agent. The simulation results show that the performance of the new algorithm is better than the current state-of-the-art algorithms, which effectively reduces the delay time of vehicles and improves the traffic efficiency of the traffic network.
引用
收藏
页码:4483 / 4498
页数:16
相关论文
共 47 条
[1]  
Abed-Alguni Bilal H., 2016, International Journal of Artificial Intelligence, V14, P71
[2]   Analyzing and visualizing multiagent rewards in dynamic and stochastic domains [J].
Agogino, Adrian K. ;
Tumer, Kagan .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2008, 17 (02) :320-338
[3]   FQ-AGO: Fuzzy Logic Q-Learning Based Asymmetric Link Aware and Geographic Opportunistic Routing Scheme for MANETs [J].
Alshehri, Ali ;
Badawy, Abdel-Hameed A. ;
Huang, Hong .
ELECTRONICS, 2020, 9 (04)
[4]   Reinforcement learning-based multi-agent system for network traffic signal control [J].
Arel, I. ;
Liu, C. ;
Urbanik, T. ;
Kohls, A. G. .
IET INTELLIGENT TRANSPORT SYSTEMS, 2010, 4 (02) :128-135
[5]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[6]  
Aziz, 2013, REINFORCEMENT LEARNI
[7]   Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning [J].
Banerjee, Dipyaman ;
Sen, Sandip .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (01) :91-108
[8]  
Busoniu L, 2010, STUD COMPUT INTELL, V310, P183
[9]  
Cai Qi, 2019, Advances in Neural Information Processing Systems, P11312
[10]  
Casas, 2017, ARXIV 170309035