An improved Q-learning algorithm (XQL) for mobile robot path planning in unknown environments

被引：0

作者：

Chen, Yadong ^{[1
]}

Xu, Hongyu ^{[1
]}

Zhang, Yunjie ^{[1
]}

Yang, Zhenjian ^{[1
]}

机构：

[1] Tianjin Chengjian Univ, Sch Comp & Informat Engn, Tianjin 300000, Peoples R China

来源：

ENGINEERING RESEARCH EXPRESS | 2025年 / 7卷 / 02期

关键词：

Q-learning; mobile robot; path planning; convergence speed; path smoothing; random obstacle environment;

D O I：

10.1088/2631-8695/adcb92

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Traditional Q-Learning's slow convergence, non-smooth paths, and high energy consumption hinder mobile robots' path planning in environments with random obstacles. We introduce an improved algorithm, XQL (eXtended Q-Learning), designed to address these limitations. First, we enhance the Q-value update mechanism by introducing a neighboring state smoothing Q-function. This enhancement significantly accelerates the convergence process. Next, we apply the artificial potential field method to design a reward function. This divides the reward into a global gravitational field and a target area reward, providing more accurate guidance for the robot's movement. We also introduce a cosine function to adaptively adjust the greedy factor. This allows the epsilon value to change dynamically based on the algorithm's state, balancing exploration and exploitation. Finally, to reduce energy consumption associated with sharp turns, we introduce an optimal action pool. This pool filters and optimizes the action set at each step. Simulation results show that, compared to SARSA, Q-Learning, DQN, and EPRQL, the XQL algorithm outperforms these methods in random environments. It improves convergence speed and path planning efficiency. It also reduces the number of path inflection points, resulting in smoother paths. This enhances both the robot's safety and energy efficiency. These findings highlight the strong adaptability of the XQL algorithm in dealing with random environments.

引用

页数：19

共 31 条

[1] Path Planning of Mobile Robot With Improved Ant Colony Algorithm and MDP to Produce Smooth Trajectory in Grid-Based Environment [J].

Ali, Hub ;

Gong, Dawei ;

Wang, Meng ;

Dai, Xiaolin .

FRONTIERS IN NEUROROBOTICS, 2020, 14

[2] Cooperative path planning study of distributed multi-mobile robots based on optimised ACO algorithm [J].

Cai, Zhi ;

Liu, Jiahang ;

Xu, Lin ;

Wang, Jiayi .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 179

[3] A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance [J].

Chen, Pengzhan ;

Pei, Jiean ;

Lu, Weiqing ;

Li, Mingzhen .

NEUROCOMPUTING, 2022, 497 :64-75

[4] Learning to construct a solution for UAV path planning problem with positioning error correction [J].

Chun, Jie ;

Chen, Ming ;

Liu, Xiaolu ;

Xiang, Shang ;

Du, Yonghao ;

Wu, Guohua ;

Xing, Lining .

KNOWLEDGE-BASED SYSTEMS, 2024, 304

[5] Predictive reinforcement learning: map-less navigation method for mobile robot [J].

Dobriborsci, Dmitrii ;

Zashchitin, Roman ;

Kakanov, Mikhail ;

Aumer, Wolfgang ;

Osinenko, Pavel .

JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 35 (8) :4217-4232

[6]

Dong Huaizhi, 2024, 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS), P1099, DOI 10.1109/DDCLS61622.2024.10606895

[7] A path planning approach for mobile robots using short and safe Q-learning [J].

Du, He ;

Hao, Bing ;

Zhao, Jianshuo ;

Zhang, Jiamin ;

Wang, Qi ;

Yuan, Qi .

PLOS ONE, 2022, 17 (09)

[8]

Gibney E, 2016, NATURE, V529, P445, DOI 10.1038/529445a

[9] Recent Developments of Game Theory and Reinforcement Learning Approaches: A Systematic Review [J].

Jain, Garima ;

Kumar, Arun ;

Bhat, Shahid Ahmad .

IEEE ACCESS, 2024, 12 :9999-10011

[10]

[井征淼 Jing Zhengmiao], 2024, [火力与指挥控制, Fire Control & Command Control], V49, P135

← 1 2 3 4 →