Scenario-Free Autonomous Driving With Multi-Task Offline-to-Online Reinforcement Learning

被引：0

作者：

Lee, Dongsu ^{[1
]}

Kwon, Minhae ^{[1
,2
]}

机构：

[1] Soongsil Univ, Dept Intelligent Semicond, Seoul 60978, South Korea

[2] Soongsil Univ, Sch Elect Engn, Seoul 60978, South Korea

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2025年

基金：

新加坡国家研究基金会;

关键词：

Multitasking; Autonomous vehicles; Reinforcement learning; Roads; Decision making; Training; Intelligent transportation systems; Transfer learning; Simulation; Measurement; Offline reinforcement learning; online reinforcement learning; multi-task reinforcement learning; autonomous driving; IMPACT;

D O I：

10.1109/TITS.2025.3583635

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

The primary goal of an autonomous driving system is to achieve full autonomy by integrating the capability of adaptive decision-making across a wide range of driving scenarios. Despite recent advances in reinforcement learning (RL) enabling the development of policies for adaptive behaviors, current solutions are typically tailored for specific driving scenarios (e.g., highway, tollgate) rather than providing a scenario-free solution. To address this limitation, this study focuses on developing an autonomous driving policy that can operate across diverse driving scenarios through a generalized decision-making model. Specifically, an autonomous vehicle learns a unified multi-task policy by utilizing a shared replay buffer across all scenarios, thereby enhancing sample and learning efficiencies. Furthermore, we adopt an offline-to-online RL approach to leverage both the sample efficiency of offline RL and the performance improvements of online RL. The proposed solution involves an algorithmic shift aimed at maximizing the objectives of each RL method, incorporating three key techniques: Q re-initialization, Q adaptation, and policy variance re-initialization. To validate our solution, we compare its performance with existing RL methods and analyze driving behavior using objective-aware and safety-aware metrics. Our findings demonstrate that the proposed solution achieves superior performance across most metrics, irrespective of dataset quality.

引用

页数：14

共 69 条

[1]

Agarwal R., 2019, P INT C MACH LEARN J, P104

[2] Heterogeneous Multi-Task Learning With Expert Diversity [J].

Aoki, Raquel ;

Tung, Frederick ;

Oliveira, Gabriel L. .

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) :3093-3102

[3] Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning [J].

Bai, Chenjia ;

Wang, Lingxiao ;

Hao, Jianye ;

Yang, Zhuoran ;

Zhao, Bin ;

Wang, Zhen ;

Li, Xuelong .

ARTIFICIAL INTELLIGENCE, 2024, 326

[4]

Bin Peng X, 2019, Arxiv, DOI arXiv:1910.00177

[5]

Brandfonbrener D., 2021, ADV NEURAL INFORM PR, P4933

[6]

Brunskill E., 2013, P C UNC ART INT

[7] Milestones in Autonomous Driving and Intelligent Vehicles-Part 1: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors [J].

Chen, Long ;

Li, Yuchen ;

Huang, Chao ;

Xing, Yang ;

Tian, Daxin ;

Li, Li ;

Hu, Zhongxu ;

Teng, Siyu ;

Lv, Chen ;

Wang, Jinjun ;

Cao, Dongpu ;

Zheng, Nanning ;

Wang, Fei-Yue .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (09) :5831-5847

[8]

Chen X., 2021, P INT C LEARN REPR J

[9] Iterative learning control for lane-changing trajectories upstream off-ramp bottlenecks and safety evaluation [J].

Dong, Changyin ;

Xing, Lu ;

Wang, Hao ;

Yu, Xinlian ;

Liu, Yunjie ;

Ni, Daiheng .

ACCIDENT ANALYSIS AND PREVENTION, 2023, 183

[10] The Impact of Dataset on Offline Reinforcement Learning Performance in UAV-Based Emergency Network Recovery Tasks [J].

Eo, Jeyeon ;

Lee, Dongsu ;

Kwon, Minhae .

IEEE COMMUNICATIONS LETTERS, 2024, 28 (05) :1058-1061

← 1 2 3 4 5 6 7 →