RLfOLD: Reinforcement Learning from Online Demonstrations in Urban Autonomous Driving

被引：0

作者：

Coelho, Daniel ^{[1
,2
]}

Oliveira, Miguel ^{[1
,2
]}

Santos, Vitor ^{[1
,2
]}

机构：

[1] Univ Aveiro, Dept Mech Engn, P-3810193 Aveiro, Portugal

[2] Univ Aveiro, Inst Elect & Informat Engn Aveiro IEETA, Intelligent Syst Associate Lab LASI, P-3810193 Aveiro, Portugal

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10 | 2024年

关键词：

FRAMEWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning from Demonstrations (RLfD) has emerged as an effective method by fusing expert demonstrations into Reinforcement Learning (RL) training, harnessing the strengths of both Imitation Learning (IL) and RL. However, existing algorithms rely on offline demonstrations, which can introduce a distribution gap between the demonstrations and the actual training environment, limiting their performance. In this paper, we propose a novel approach, Reinforcement Learning from Online Demonstrations (RL-fOLD), that leverages online demonstrations to address this limitation, ensuring the agent learns from relevant and up-to-date scenarios, thus effectively bridging the distribution gap. Unlike conventional policy networks used in typical actorcritic algorithms, RLfOLD introduces a policy network that outputs two standard deviations: one for exploration and the other for IL training. This novel design allows the agent to adapt to varying levels of uncertainty inherent in both RL and IL. Furthermore, we introduce an exploration process guided by an online expert, incorporating an uncertainty-based technique. Our experiments on the CARLA NoCrash benchmark demonstrate the effectiveness and efficiency of RLfOLD. Notably, even with a significantly smaller encoder and a single-camera setup, RLfOLD surpasses state-of-the-art methods in this evaluation. These results, achieved with limited resources, highlight RLfOLD as a highly promising solution for real-world applications.

引用

页码：11660 / 11668

页数：9

共 40 条

[31] End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit Affordances
Toromanoff, Marin
Wirbel, Emilie
Moutarde, Fabien
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7151 - 7160
[32] van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094
[33] Vecerik M, 2018, Arxiv, DOI arXiv:1707.08817
[34] Wu PH, 2022, ADV NEUR IN
[35] Xiao LC, 2018, PR MACH LEARN RES, V80
[36] Yarats D., 2021, arXiv
[37] Zhang Z., 2021, P IEEE CVF INT C COM, P15222
[38] Zhao A., 2021, CORL, V155, P156
[39] Zhao YU, 2022, AAAI CONF ARTIF INTE, P3481
[40] Ziebart B. D., 2008, AAAI, P1433

← 1 2 3 4 →