MERCA: A Multi-objective Optimization Traffic Light Control Model

被引:0
作者
Zhu, Yi [1 ]
Wang, Yongli [1 ]
Liu, Dongmei [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
来源
2024 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING, ICAICE | 2024年
关键词
Traffic signal control; Reinforcement learning; Multi-objective optimization; Reward shaping;
D O I
10.1109/ICAICE63571.2024.10864265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep reinforcement learning (DRL) has demonstrated significant capabilities in adjusting traffic signal timing. However, designing a multi-objective reward structure and addressing the issue of reward sparsity remain major challenges for DRL. In this paper, the authors propose a new reinforcement learning model architecture called MERCA (Multiobjective Deep Deterministic Policy Gradient) for multi-objective reward optimization. MERCA constructs corresponding critic for each optimization objective, and updates the critics based on the aggregated evaluation values of all objectives to achieve multiobjective optimization. To address the reward sparsity issue, this paper further investigates reward shaping methods. MERCA learns an intrinsic reward function in a completely self-supervised manner, in addition to the external reward, and maximizes the agent's performance through exploration- based rewards. The paper compares MERCA with other baseline models using the traffic simulator SUMO. The experimental results demonstrate that MERCA exhibits better performance in conventional traffic metrics and has faster learning speed for the intelligent agent.
引用
收藏
页码:999 / 1006
页数:8
相关论文
共 22 条
[1]  
Abbeel P., 2004, Proceedings of the Twenty-First International Conference on Machine Learning, P1
[2]   Modeling and Controlling Smart Traffic Light System Using a Rule Based System [J].
Albatish, Islam Mohammad ;
Abu-Naser, Samy S. .
2019 INTERNATIONAL CONFERENCE ON PROMISING ELECTRONIC TECHNOLOGIES (ICPET 2019), 2019, :55-60
[3]  
[Anonymous], 2014, T. Economist
[4]  
Brown Daniel S., 2019, PR MACH LEARN RES, V97
[5]  
Brys T, 2014, IEEE IJCNN, P2315, DOI 10.1109/IJCNN.2014.6889732
[6]  
Cools SB, 2008, ADV INFORM KNOWL PRO, P41, DOI 10.1007/978-1-84628-982-8_3
[7]  
Devidze R, 2022, ADV NEUR IN
[8]  
Genders W, 2019, Arxiv, DOI arXiv:1909.00395
[9]   ON THE OPTIMAL-CONTROL OF 2 QUEUES WITH SERVER SETUP TIMES AND ITS ANALYSIS [J].
HOFRI, M ;
ROSS, KW .
SIAM JOURNAL ON COMPUTING, 1987, 16 (02) :399-420
[10]  
Jiang Yuqian, 2021, AAAI