Enhancing Car-Following Performance in Traffic Oscillations Using Expert Demonstration Reinforcement Learning

被引：3

作者：

Li, Meng ^{[1
,2
]}

Li, Zhibin ^{[1
]}

Cao, Zehong ^{[3
]}

机构：

[1] Southeast Univ, Sch Transportat, Nanjing 210096, Peoples R China

[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, Singapore 639798, Singapore

[3] Univ South Australia, STEM, Adelaide, SA 5095, Australia

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Training; Oscillators; Trajectory; Task analysis; Cloning; Safety; Databases; Expert demonstration; reinforcement learning; car-following control; traffic oscillation; ADAPTIVE CRUISE CONTROL; AUTOMATED VEHICLES; CONTROL STRATEGY; MODEL; IMPACT;

D O I：

10.1109/TITS.2024.3368474

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Deep reinforcement learning (DRL) algorithms often face challenges in achieving stability and efficiency due to significant policy gradient variance and inaccurate reward function estimation in complex scenarios. This study addresses these issues in the context of multi-objective car-following control tasks with time lag in traffic oscillations. We propose an expert demonstration reinforcement learning (EDRL) approach that aims to stabilize training, accelerate learning, and enhance car-following performance. The key idea is to leverage expert demonstrations, which represent superior car-following control experiences, to improve the DRL policy. Our method involves two sequential steps. In the first step, expert demonstrations are obtained during offline pretraining by utilizing prior traffic knowledge, including car-following trajectories from an empirical database and classic car-following models. In the second step, expert demonstrations are obtained during online training, where the agent interacts with the car-following environment. The EDRL agents are trained through supervised regression on the expert demonstrations using the behavioral cloning technique. Experimental results conducted in various traffic oscillation scenarios demonstrate that our proposed method significantly enhances training stability, learning speed, and rewards compared to baseline algorithms.

引用

页码：7751 / 7766

页数：16

共 57 条

[31]

Riedmiller, 2017, ARXIV PREPRINT ARXIV

[32]

Schulman J., 2017, ARXIV170706347CS

[33] Assessing traffic disturbance, efficiency, and safety of the mixed traffic flow of connected vehicles and traditional vehicles by considering human factors [J].

Sharma, Anshuman ;

Zheng, Zuduo ;

Kim, Jiwon ;

Bhaskar, Ashish ;

Haque, Md Mazharul .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 124

[34] A deep reinforcement learning based distributed control strategy for connected automated vehicles in mixed traffic platoon [J].

Shi, Haotian ;

Chen, Danjue ;

Zheng, Nan ;

Wang, Xin ;

Zhou, Yang ;

Ran, Bin .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2023, 148

[35] A deep reinforcement learning-based distributed connected automated vehicle control under communication failure [J].

Shi, Haotian ;

Zhou, Yang ;

Wang, Xin ;

Fu, Sicheng ;

Gong, Siyuan ;

Ran, Bin .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (15) :2033-2051

[36] A distributed deep reinforcement learning-based integrated dynamic bus control system in a connected environment [J].

Shi, Haotian ;

Nie, Qinghui ;

Fu, Sicheng ;

Wang, Xin ;

Zhou, Yang ;

Ran, Bin .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (15) :2016-2032

[37] Connected automated vehicle cooperative control with a deep reinforcement learning approach in a mixed traffic environment [J].

Shi, Haotian ;

Zhou, Yang ;

Wu, Keshu ;

Wang, Xin ;

Lin, Yangxin ;

Ran, Bin .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 133

[38] Modeling Car-Following Behavior on Freeways Considering Driving Style [J].

Sun, Ping ;

Wang, Xuesong ;

Zhu, Meixin .

JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2021, 147 (12)

[39]

Tang HL, 2017, INT CONF MEAS, P1, DOI [10.1109/ICMTMA.2017.8, 10.1109/ICMTMA.2017.0009]

[40]

Thrun Sebastian, 2000, AI Magazine, V21, P103

← 1 2 3 4 5 6 →