Proximal policy optimization learning based control of congested freeway traffic

被引:0
作者
Mo, Shurong [1 ]
Wu, Nailong [1 ,2 ,6 ]
Qi, Jie [1 ,2 ]
Pan, Anqi [1 ]
Feng, Zhiguang [3 ]
Yan, Huaicheng [4 ]
Wang, Yueying [5 ]
机构
[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai, Peoples R China
[2] Donghua Univ, Minist Educ, Engn Res Ctr Digitized Text & Apparel Technol, Shanghai, Peoples R China
[3] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[4] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China
[5] Shanghai Univ, Sch Mechatron Engn & Automation, Shanghai, Peoples R China
[6] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
adaptive control; adaptive cruise control; input delay; proximal policy optimization; traffic flow; FEEDBACK-CONTROL; FLOW; INSTABILITY; SYSTEMS; MODELS; WAVES;
D O I
10.1002/oca.3068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high-speed traffic flow is characterized by a two-by-two Aw Rasle Zhang nonlinear first-order partial differential equations (PDEs). Unlike the backstepping delay compensation control,23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise. A delay compensation feedback controller based on reinforcement learning is proposed, utilizing theProximal Policy Optimization (PPO) algorithm to adaptively adjust the gains ofthe cruise controller, thereby regulating vehicle time intervals in traffic congestion.The traffic congestion flow is described using the Aw Rasle Zhang model. The numericalsimulations are conducted to compare the performance of Lyapunov control,backstepping control, and PPO control. The results demonstrate the superiorconvergence and robustness of the proposed method.image
引用
收藏
页码:719 / 736
页数:18
相关论文
共 50 条
  • [21] A control scheme for freeway traffic systems based on hybrid automata
    Sacone, Simona
    Siri, Silvia
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2012, 22 (01): : 3 - 25
  • [22] Macroscopic traffic flow model validation at congested freeway off-ramp areas
    Spiliopoulou, A.
    Kontorinaki, M.
    Papageorgiou, M.
    Kopelias, P.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2014, 41 : 18 - 29
  • [23] A mechanism to describe the formation and propagation of stop-and-go waves in congested freeway traffic
    Laval, Jorge A.
    Leclercq, Ludovic
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2010, 368 (1928): : 4519 - 4541
  • [24] A Cellular Automata Based Model for Traffic in Congested City
    Das, Sukanta
    Saha, Meghnath
    Sikdar, Biplab K.
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2397 - +
  • [25] Proximal policy optimization with an integral compensator for quadrotor control
    Huan Hu
    Qing-ling Wang
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795
  • [26] Proximal policy optimization with an integral compensator for quadrotor control
    Hu, Huan
    Wang, Qing-ling
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
  • [27] Local Synchronization Control Scheme for Congested Interchange Areas in Freeway Corridor
    Zhang, H. Michael
    Ma, Jingtao
    Nie, Yu
    TRANSPORTATION RESEARCH RECORD, 2009, (2128) : 173 - 183
  • [28] An Enhanced Proximal Policy Optimization-Based Reinforcement Learning Method with Random Forest for Hyperparameter Optimization
    Ma, Zhixin
    Cui, Shengmin
    Joe, Inwhee
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [29] Surrogate-based optimization of cordon toll levels in congested traffic networks
    Ekstrom, Joakim
    Kristoffersson, Ida
    Quttineh, Nils-Hassan
    JOURNAL OF ADVANCED TRANSPORTATION, 2016, 50 (06) : 1008 - 1033
  • [30] Mapless Navigation with Deep Reinforcement Learning based on The Convolutional Proximal Policy Optimization Network
    Toan, Nguyen Duc
    Woo, Kim Gon
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 298 - 301