Residual Policy Learning Facilitates Efficient Model-Free Autonomous Racing

被引：25

作者：

Zhang, Ruiqi ^{[1
]}

Hou, Jing ^{[1
]}

Chen, Guang ^{[1
,2
]}

Li, Zhijun ^{[3
]}

Chen, Jianxiao ^{[1
]}

Knoll, Alois ^{[2
]}

机构：

[1] Tongji Univ, Sch Automot Studies, Shanghai 201804, Peoples R China

[2] Tech Univ Munich, Dept Informat, Munich, Germany

[3] Univ Sci & Technol China, Wearable Robot & Autonomous Syst Lab, Hefei 230022, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2022年 / 7卷 / 04期

基金：

欧盟地平线“2020”; 中国国家自然科学基金;

关键词：

Autonomous vehicle navigation; motion and path planning; reinforcement learning; PREDICTIVE CONTROL; AVOIDANCE;

D O I：

10.1109/LRA.2022.3192770

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Motion planning for autonomous racing is a challenging task due to the safety requirement while driving aggressively. Most previous solutions utilize the prior information or depend on complex dynamics modeling. Classical model-free reinforcement learning methods are based on random sampling, which severely increases the training consumption and undermines the exploration efficiency. In this letter, we propose an efficient residual policy learning method for high-speed autonomous racing named ResRace, which leverages only the real-time raw observation of LiDAR and IMU for low-latency obstacle avoiding and navigation. We first design a controller based on the modified artificial potential field (MAPF) to generate a policy for navigation. Besides, we utilize the deep reinforcement learning (DRL) algorithm to generate a residual policy as a supplement to obtain the optimal policy. Concurrently, the MAPF policy effectively guides the exploration and increases the update efficiency. This complementary property contributes to the fast convergence and few required resources of our method. We also provide extensive experiments to illustrate our method outperforms the leading algorithms and reaches the comparable level of professional human players on the five F1Tenth tracks.

引用

页码：11625 / 11632

页数：8

共 50 条

[11]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[12]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[13]

Hafner D., 2019, PROC INT C LEARN REP

[14] A FORMAL BASIS FOR HEURISTIC DETERMINATION OF MINIMUM COST PATHS [J].

HART, PE ;

NILSSON, NJ ;

RAPHAEL, B .

IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, 1968, SSC4 (02) :100-+

[15] Control of a Quadrotor With Reinforcement Learning [J].

Hwangbo, Jemin ;

Sa, Inkyu ;

Siegwart, Roland ;

Hutter, Marco .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2017, 2 (04) :2096-2103

[16]

Jain A, 2020, PR MACH LEARN RES, V155, P1918

[17]

Jaritz M, 2018, IEEE INT CONF ROBOT, P2070

[18]

Johannink T, 2019, IEEE INT CONF ROBOT, P6023, DOI [10.1109/icra.2019.8794127, 10.1109/ICRA.2019.8794127]

[19] Learning-Based Model Predictive Control for Autonomous Racing [J].

Kabzan, Juraj ;

Hewing, Lukas ;

Liniger, Alexander ;

Zeilinger, Melanie N. .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04) :3363-3370

[20]

Kalaria D., 2021, arXiv

← 1 2 3 4 5 →