Enhancing Car-Following Performance in Traffic Oscillations Using Expert Demonstration Reinforcement Learning

被引：3

作者：

Li, Meng ^{[1
,2
]}

Li, Zhibin ^{[1
]}

Cao, Zehong ^{[3
]}

机构：

[1] Southeast Univ, Sch Transportat, Nanjing 210096, Peoples R China

[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, Singapore 639798, Singapore

[3] Univ South Australia, STEM, Adelaide, SA 5095, Australia

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Training; Oscillators; Trajectory; Task analysis; Cloning; Safety; Databases; Expert demonstration; reinforcement learning; car-following control; traffic oscillation; ADAPTIVE CRUISE CONTROL; AUTOMATED VEHICLES; CONTROL STRATEGY; MODEL; IMPACT;

D O I：

10.1109/TITS.2024.3368474

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Deep reinforcement learning (DRL) algorithms often face challenges in achieving stability and efficiency due to significant policy gradient variance and inaccurate reward function estimation in complex scenarios. This study addresses these issues in the context of multi-objective car-following control tasks with time lag in traffic oscillations. We propose an expert demonstration reinforcement learning (EDRL) approach that aims to stabilize training, accelerate learning, and enhance car-following performance. The key idea is to leverage expert demonstrations, which represent superior car-following control experiences, to improve the DRL policy. Our method involves two sequential steps. In the first step, expert demonstrations are obtained during offline pretraining by utilizing prior traffic knowledge, including car-following trajectories from an empirical database and classic car-following models. In the second step, expert demonstrations are obtained during online training, where the agent interacts with the car-following environment. The EDRL agents are trained through supervised regression on the expert demonstrations using the behavioral cloning technique. Experimental results conducted in various traffic oscillation scenarios demonstrate that our proposed method significantly enhances training stability, learning speed, and rewards compared to baseline algorithms.

引用

页码：7751 / 7766

页数：16

共 57 条

[1] Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles [J].

Aradi, Szilard .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :740-759

[2]

Bellemare MG, 2016, ADV NEUR IN, V29

[3] Trustworthy safety improvement for autonomous driving using reinforcement learning [J].

Cao, Zhong ;

Xu, Shaobing ;

Jiao, Xinyu ;

Peng, Huei ;

Yang, Diange .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2022, 138

[4]

Chen D., TRANSP RES B METHODO

[5] Robustly String Stable Longitudinal Control for Vehicle Platoons Under Communication Failures: A Generalized Extended State Observer-Based Control Approach [J].

Chen, Qian ;

Zhou, Yang ;

Ahn, Soyoung ;

Xia, Jingxin ;

Li, Shen ;

Li, Shihua .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (01) :159-171

[6] Key feature selection and risk prediction for lane-changing behaviors based on vehicles' trajectory data [J].

Chen, Tianyi ;

Shi, Xiupeng ;

Wong, Yiik Diew .

ACCIDENT ANALYSIS AND PREVENTION, 2019, 129 (156-169) :156-169

[7] Comparison of threshold determination methods for the deceleration rate to avoid a crash (DRAC)-based crash estimation [J].

Fu, Chuanyun ;

Sayed, Tarek .

ACCIDENT ANALYSIS AND PREVENTION, 2021, 153

[8]

Henderson P, 2018, AAAI CONF ARTIF INTE, P3207

[9]

Hessel M, 2018, AAAI CONF ARTIF INTE, P3215

[10]

Hester T, 2018, AAAI CONF ARTIF INTE, P3223

← 1 2 3 4 5 6 →