ETQ-learning: an improved Q-learning algorithm for path planning

被引：5

作者：

Wang, Huanwei ^{[1
]}

Jing, Jing ^{[1
]}

Wang, Qianlv ^{[1
]}

He, Hongqi ^{[1
]}

Qi, Xuyan ^{[1
]}

Lou, Rui ^{[1
]}

机构：

[1] PLA Informat Engn Univ, Sci Ave 62, Zhengzhou 450001, Henan, Peoples R China

来源：

INTELLIGENT SERVICE ROBOTICS | 2024年 / 17卷 / 04期

关键词：

Q-learning; Path planning; Reinforcement learning; Reward mechanism; Greedy strategy;

D O I：

10.1007/s11370-024-00544-3

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm's reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of "static assignment + dynamic adjustment." This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the epsilon-acc-increasing\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon -acc-increasing$$\end{document} greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm's obstacle avoidance performance. We design the expansion distance, which pre-sets a "collision buffer" between the obstacle and agent to enhance the algorithm's obstacle avoidance performance.

引用

页码：915 / 929

页数：15

共 28 条

[1]

Andrychowicz M., 2017, Advances in Neural Information Processing Systems, P30

[2]

Ates U., 2020, 2020 INN INT SYST AP, P1

[3]

chengbo W., 2018, SHIP OCEAN ENG, V47, P168

[4]

Costa MM, 2019, IEEE INT CONF AUTON, P33

[5] Q-learning based Path Planning Method for UAVs using Priority Shifting [J].

de Carvalho, Kevin B. ;

de Oliveira, Iure Rosa L. ;

Villa, Daniel K. D. ;

Caldeira, Alexandre G. ;

Sarcinelli-Filho, Mario ;

Brandao, Alexandre S. .

2022 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2022, :421-426

[6]

Fortunato Meire, 2017, arXiv

[7]

guojun M., 2021, J TAIYUAN U TECHNOL, V52, P91

[8]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[9]

Hasselt H. V., 2010, Advances in neural information processing systems, P2613

[10] Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments [J].

Hu, Zijian ;

Gao, Xiaoguang ;

Wan, Kaifang ;

Zhai, Yiwei ;

Wang, Qianglong .

CHINESE JOURNAL OF AERONAUTICS, 2021, 34 (12) :187-204

← 1 2 3 →