Proximal policy optimization learning based control of congested freeway traffic

被引:0
|
作者
Mo, Shurong [1 ]
Wu, Nailong [1 ,2 ,6 ]
Qi, Jie [1 ,2 ]
Pan, Anqi [1 ]
Feng, Zhiguang [3 ]
Yan, Huaicheng [4 ]
Wang, Yueying [5 ]
机构
[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai, Peoples R China
[2] Donghua Univ, Minist Educ, Engn Res Ctr Digitized Text & Apparel Technol, Shanghai, Peoples R China
[3] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[4] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China
[5] Shanghai Univ, Sch Mechatron Engn & Automation, Shanghai, Peoples R China
[6] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
adaptive control; adaptive cruise control; input delay; proximal policy optimization; traffic flow; FEEDBACK-CONTROL; FLOW; INSTABILITY; SYSTEMS; MODELS; WAVES;
D O I
10.1002/oca.3068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high-speed traffic flow is characterized by a two-by-two Aw Rasle Zhang nonlinear first-order partial differential equations (PDEs). Unlike the backstepping delay compensation control,23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise. A delay compensation feedback controller based on reinforcement learning is proposed, utilizing theProximal Policy Optimization (PPO) algorithm to adaptively adjust the gains ofthe cruise controller, thereby regulating vehicle time intervals in traffic congestion.The traffic congestion flow is described using the Aw Rasle Zhang model. The numericalsimulations are conducted to compare the performance of Lyapunov control,backstepping control, and PPO control. The results demonstrate the superiorconvergence and robustness of the proposed method.image
引用
收藏
页码:719 / 736
页数:18
相关论文
共 50 条
  • [1] Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic
    Yu, Huan
    Park, Saehong
    Bayen, Alexandre
    Moura, Scott
    Krstic, Miroslav
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (04) : 1595 - 1611
  • [2] Traffic Signal Control Method Based on Modified Proximal Policy Optimization
    An, Yaohui
    Zhang, Jing
    2022 10TH INTERNATIONAL CONFERENCE ON TRAFFIC AND LOGISTIC ENGINEERING (ICTLE 2022), 2022, : 83 - 88
  • [3] An adaptive traffic signal control scheme with Proximal Policy Optimization based on deep reinforcement learning for a single intersection
    Wang, Lijuan
    Zhang, Guoshan
    Yang, Qiaoli
    Han, Tianyang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [4] Multilane freeway merging control via trajectory optimization in a mixed traffic environment
    Han, Lei
    Zhang, Lun
    Guo, Weian
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (09) : 1891 - 1907
  • [5] Improving traffic signal control operations using proximal policy optimization
    Huang, Liben
    Qu, Xiaohui
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (03) : 588 - 601
  • [6] Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning
    Lopes, Guilherme Cano
    Ferreira, Murillo
    Simoes, Alexandre da Silva
    Colombini, Esther Luna
    15TH LATIN AMERICAN ROBOTICS SYMPOSIUM 6TH BRAZILIAN ROBOTICS SYMPOSIUM 9TH WORKSHOP ON ROBOTICS IN EDUCATION (LARS/SBR/WRE 2018), 2018, : 503 - 508
  • [7] Reactive Power Optimization Based on Proximal Policy Optimization of Deep Reinforcement Learning
    Zahng P.
    Zhu Z.
    Xie H.
    Dianwang Jishu/Power System Technology, 2023, 47 (02): : 562 - 570
  • [8] Merging in Congested Freeway Traffic Using Multipolicy Decision Making and Passive Actor-Critic Learning
    Nishi, Tomoki
    Doshi, Prashant
    Prokhorov, Danil
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2019, 4 (02): : 287 - 297
  • [9] New frontiers of freeway traffic control and estimation
    Delle Monache, M. L.
    Pasquale, C.
    Barreau, M.
    Stern, R.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 6910 - 6925
  • [10] Modular production control using deep reinforcement learning: proximal policy optimization
    Sebastian Mayer
    Tobias Classen
    Christian Endisch
    Journal of Intelligent Manufacturing, 2021, 32 : 2335 - 2351