Deep Reinforcement Learning for Autonomous Driving Based on Safety Experience Replay

被引：0

作者：

Huang, Xiaohan ^{[1
,2
]}

Cheng, Yuhu ^{[1
,2
]}

Yu, Qiang ^{[1
,2
]}

Wang, Xuesong ^{[1
,2
]}

机构：

[1] China Univ Min & Technol, Engn Res Ctr Intelligent Control Underground Space, Xuzhou Key Lab Artificial Intelligence & Big Data, Minist Educ, Xuzhou 221116, Peoples R China

[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2024年 / 16卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Safety; Autonomous vehicles; Training; Task analysis; Reinforcement learning; Optimization; Decision making; Autonomous driving; deep learning; experience replay; safe reinforcement learning (RL);

D O I：

10.1109/TCDS.2024.3405896

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the field of autonomous driving, safety has always been a top priority, especially in recent years with the development and increasing application of deep reinforcement learning (DRL) in autonomous driving. Ensuring the safety of algorithms has become an indispensable concern. Reinforcement learning (RL), which involves interacting with the environment through trial and error, may result in unsafe behavior in autonomous driving without any safety constraints. Such behavior could result in the drive path deviation and even collision, causing catastrophic accidents. Therefore, this article proposes a reinforcement learning algorithm based on a safety experience replay mechanism, which is primarily to enhance the safety of reinforcement learning in autonomous driving. First, the ego vehicle conducts preliminary exploration of the environment to collect data. Based on the performance of completing tasks observed from each data trajectory, safety labels of different levels are assigned to all state-action pairs, which establishes a safety experience buffer. Further, a safety-critic network is constructed, which is trained by randomly sampling from the safety experience buffer. This enables the network to quantitatively evaluate the safety of driving actions, and the goal of safe driving for ego vehicle is achieved. The experimental results indicate that the proposed method can effectively reduce driving risks and improve task success rates compared with conventional reinforcement learning algorithms.

引用

页码：2070 / 2084

页数：15

共 38 条

[1]

Achiam J, 2017, PR MACH LEARN RES, V70

[2]

Andrychowicz M., 2017, P ADV NEUR INF PROC, P5048

[3] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[4]

Balkenius C, 2008, FR ART INT, V173, P20

[5] Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning [J].

Chen, Jianyu ;

Li, Shengbo Eben ;

Tomizuka, Masayoshi .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) :5068-5078

[6]

Chen JY, 2019, IEEE INT C INTELL TR, P2765, DOI [10.1109/ITSC.2019.8917306, 10.1109/itsc.2019.8917306]

[7]

Chow Y., 2017, Journal of Machine Learning Research, V18, P6070

[8]

Dosovitskiy A., 2017, P 1 ANN C ROB LEARN, V78, P1, DOI DOI 10.48550/ARXIV.1711.03938

[9]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[10] GCEN: Multiagent Deep Reinforcement Learning With Grouped Cognitive Feature Representation [J].

Gao, Hao ;

Xu, Xin ;

Yan, Chao ;

Lan, Yixing ;

Yao, Kangxing .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (02) :458-473

← 1 2 3 4 →