A Guided-to-Autonomous Policy Learning method of Deep Reinforcement Learning in Path Planning

被引：0

作者：

Zhao, Wang ^{[1
]}

Zhang, Ye ^{[1
]}

Li, Haoyu ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Astronaut, Xian, Peoples R China

来源：

2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

path planning; Deep Reinforcement Learning; training efficiency; composite optimization; Guided-to-Autonomous Policy Learning;

D O I：

10.1109/ICCA62789.2024.10591821

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study introduces a Guided-to-Autonomous Policy Learning (GAPL) method that improves the training efficiency and composite optimization of Deep Reinforcement Learning (DRL) in path planning. Under this method, firstly, we introduce the concept of guiding rewards as a reward enhancement mechanism, which, based on Rapidly-exploring Random Trees (RRT) and Artificial Potential Field (APF) algorithm, effectively addresses the challenge of training efficiency. We then propose the Guided-to-Autonomous Reward Transition (GART) model to solve the combined challenges of balancing training efficiency with composite optimization problems, which lies in the evolutionary refinement of the reward structure, initially dominated by guiding rewards, transiting progressively toward a focus on rewards that emphasize composite optimization, specifically minimizing the distance and time to the end point. Simulated experiments in static obstacle settings and mixed dynamic-static obstacle environments demonstrate that: 1) guiding rewards play a significant role in enhancing training efficiency; 2) the GAPL method yields superior composite optimization outcomes for path planning compared to conventional methods, and it effectively addresses the issue of training efficiency in conventional DRL method.

引用

页码：665 / 672

页数：8

共 25 条

[1] Modular Deep Reinforcement Learning for Continuous Motion Planning With Temporal Logic [J].

Cai, Mingyu ;

Hasanbeig, Mohammadhosein ;

Xiao, Shaoping ;

Abate, Alessandro ;

Kan, Zhen .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) :7973-7980

[2]

Chi Hai-Hong, 2022, Control Theory & Applications, P847, DOI 10.7641/CTA.2021.10478

[3] Path Planning Based on Deep Reinforcement Learning for Autonomous Underwater Vehicles Under Ocean Current Disturbance [J].

Chu, Zhenzhong ;

Wang, Fulun ;

Lei, Tingjun ;

Luo, Chaomin .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (01) :108-120

[4]

Graesser L., 2019, Foundations of Deep Reinforcement Learning: Theory and Practice in Python, V1st, P289

[5] Towards Multi-Modal Perception-Based Navigation: A Deep Reinforcement Learning Method [J].

Huang, Xueqin ;

Deng, Han ;

Zhang, Wei ;

Song, Ran ;

Li, Yibin .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) :4986-4993

[6] A Coach-Based Bayesian Reinforcement Learning Method for Snake Robot Control [J].

Jia, Yuanyuan ;

Ma, Shugen .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) :2319-2326

[7]

LaValle SM, 2001, ALGORITHMIC AND COMPUTATIONAL ROBOTICS: NEW DIRECTIONS, P293

[8] Real-time path planning of controllable UAV by subgoals using goal-conditioned reinforcement learning [J].

Lee, GyeongTaek ;

Kim, KangJin ;

Jang, Jaeyeon .

APPLIED SOFT COMPUTING, 2023, 146

[9] MSN: Mapless Short-Range Navigation Based on Time Critical Deep Reinforcement Learning [J].

Li, Bohan ;

Huang, Zhelong ;

Chen, Tony Weitong ;

Dai, Tianlun ;

Zang, Yalei ;

Xie, Wenbin ;

Tian, Bo ;

Cai, Ken .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (08) :8628-8637

[10] Robot skill acquisition in assembly process using deep reinforcement learning [J].

Li, Fengming ;

Jiang, Qi ;

Zhang, Sisi ;

Wei, Meng ;

Song, Rui .

NEUROCOMPUTING, 2019, 345 :92-102

← 1 2 3 →