Urban Traffic Control in Software Defined Internet of Things via a Multi-Agent Deep Reinforcement Learning Approach

被引：159

作者：

Yang, Jiachen ^{[1
]}

Zhang, Jipeng ^{[1
]}

Wang, Huihui ^{[2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Jacksonville Univ, Dept Engn, Jacksonville, FL 32211 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2021年 / 22卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Machine learning; Feature extraction; Switches; Software; Internet of Things; Protocols; Urban traffic control; software defined internet of things; multi-agent deep reinforcement learning; modified proximal policy optimization; SIGNAL CONTROL; SYSTEM; LEVEL;

D O I：

10.1109/TITS.2020.3023788

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

As the growth of vehicles and the acceleration of urbanization, the urban traffic congestion problem becomes a burning issue in our society. Constructing a software defined Internet of things(SD-IoT) with a proper traffic control scheme is a promising solution for this issue. However, existing traffic control schemes do not make the best of the advances of the multi-agent deep reinforcement learning area. Furthermore, existing traffic congestion solutions based on deep reinforcement learning(DRL) only focus on controlling the signal of traffic lights, while ignore controlling vehicles to cooperate traffic lights. So the effect of urban traffic control is not comprehensive enough. In this article, we propose Modified Proximal Policy Optimization (Modified PPO) algorithm. This algorithm is ideally suited as the traffic control scheme of SD-IoT. We adaptively adjust the clip hyperparameter to limit the bound of the distance between the next policy and the current policy. What's more, based on the collected data of SD-IoT, the proposed algorithm controls traffic lights and vehicles in a global view to advance the performance of urban traffic control. Experimental results under different vehicle numbers show that the proposed method is more competitive and stable than the original algorithm. Our proposed method improves the performance of SD-IoT to relieve traffic congestion.

引用

页码：3742 / 3754

页数：13

共 59 条

[11]

Haarnoja T., 2018, Soft actor-critic algorithms and applications

[12] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[13]

Heess N., 2018, INT C LEARN REPR

[14] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

[15] Densely Connected Convolutional Networks [J].

Huang, Gao ;

Liu, Zhuang ;

van der Maaten, Laurens ;

Weinberger, Kilian Q. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269

[16] Mask Scoring R-CNN [J].

Huang, Zhaojin ;

Huang, Lichao ;

Gong, Yongchao ;

Huang, Chang ;

Wang, Xinggang .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6402-6411

[17] Learning agile and dynamic motor skills for legged robots [J].

Hwangbo, Jemin ;

Lee, Joonho ;

Dosovitskiy, Alexey ;

Bellicoso, Dario ;

Tsounis, Vassilios ;

Koltun, Vladlen ;

Hutter, Marco .

SCIENCE ROBOTICS, 2019, 4 (26)

[18]

Iqbal S., 2018, P INT C MACH LEARN S, P2961

[19] Adaptive group-based signal control by reinforcement learning [J].

Jin, Junchen ;

Ma, Xiaoliang .

18TH EURO WORKING GROUP ON TRANSPORTATION, EWGT 2015, 2015, 10 :207-216

[20]

Koponen T., 2010, OSDI

← 1 2 3 4 5 6 →