Path Planning for Mobile Robots Based on TPR-DDPG

被引：6

作者：

Zhao, Yaping ^{[1
]}

Wang, Xiuqing ^{[1
,2
,3
]}

Wang, Ruiyi ^{[1
]}

Yang, Yunpeng ^{[1
]}

Lv, Feng ^{[1
]}

机构：

[1] Hebei Normal Univ, Coll Comp & Cyber Secur, Shijiazhuang 050024, Hebei, Peoples R China

[2] Hebei Prov Key Lab Network & Informat Secur, Shijiazhuang, Hebei, Peoples R China

[3] Hebei Prov Engn Res Ctr Supply Chain Big Data Ana, Shijiazhuang, Hebei, Peoples R China

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

path planning; deep deterministic policy gradient (DDPG); policy network; value network; mobile robots;

D O I：

10.1109/IJCNN52387.2021.9533570

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Path planning is one of the key research topics in robotics. Nowadays, researchers pay more attention to reinforcement learning (RL) and deep learning (DL) because of RL's good generality, self-learning ability, and DL's super leaning ability. Deep deterministic policy gradient (DDPG) algorithm, which combines the architectures of deep Q-learning (DQN), deterministic policy gradient (DPG) and Actor-Critic (AC), is different from the traditional RL methods and is suitable for continuous action space. Therefore, TPR-DDPG based path planning algorithm for mobile robots is proposed. In the algorithm, the state is preprocessed by various normalization methods, and complete reward-functions are designed to make agents reach the target point quickly by optimal paths in complex environments. The BatchNorm layer is added to the policy network, which ensures the stability of the algorithm. Finally, experimental results of agents' reaching the target points successfully through the paths generated by the improved DDPG validate the effectiveness of the proposed algorithm.

引用

页数：8

共 15 条

[1] DEGRIS T, 2013, ARXIV PREPRINT ARXIV
[2] LI H, 2020, APPLICATION RESEARCH, V37, P129
[3] LI S, 2019, WRC SARA
[4] LILLICRAP TP, 2015, COMPUT SCI, V8
[5] LIU YD, 2019, PATH PLANNING OF MOB
[6] Mnih V, 2013, NIPS DEEP LEARN WORK
[7] Schaul T, 2016, P 4 INT C LEARN REPR
[8] Silver D, 2014, PR MACH LEARN RES, V32
[9] van Hasselt H., 2015, CORR
[10] Wang Z, 2016, ARXIV PREPRINT ARXIV, P1340

← 1 2 →