Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target

被引：7

作者：

Li, Hanxiao ^{[1
]}

Luo, Biao ^{[1
]}

Song, Wei ^{[2
]}

Yang, Chunhua ^{[1
]}

机构：

[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China

[2] Res Ctr Intelligent Robot, Res Inst Interdisciplinary Innovat, Zhejiang Lab, Hangzhou 311100, Peoples R China

来源：

NEURAL NETWORKS | 2023年 / 165卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Deep learning; Navigation; Moving target; VISUAL NAVIGATION; MOBILE ROBOT; ALGORITHM; LEVEL; GAME; GO;

D O I：

10.1016/j.neunet.2023.06.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has been proven as a powerful approach for robot navigation over the past few years. DRL-based navigation does not require the pre-construction of a map, instead, high-performance navigation skills can be learned from trial-and-error experiences. However, recent DRL-based approaches mostly focus on a fixed navigation target. It is noted that when navigating to a moving target without maps, the performance of the standard RL structure drops dramatically on both the success rate and path efficiency. To address the mapless navigation problem with moving target, the predictive hierarchical DRL (pH-DRL) framework is proposed by integrating the long-term trajectory prediction to provide a cost-effective solution. In the proposed framework, the lower-level policy of the RL agent learns robot control actions to a specified goal, and the higher-level policy learns to make long-range planning of shorter navigation routes by sufficiently exploiting the predicted trajectories. By means of making decisions over two level of policies, the pH-DRL framework is robust to the unavoidable errors in long-term predictions. With the application of deep deterministic policy gradient (DDPG) for policy optimization, the pH-DDPG algorithm is developed based on the pH-DRL structure. Finally, through comparative experiments on the Gazebo simulator with several variants of the DDPG algorithm, the results demonstrate that the pH-DDPG outperforms other algorithms and achieves a high success rate and efficiency even though the target moves fast and randomly. & COPY; 2023 Elsevier Ltd. All rights reserved.

引用

页码：677 / 688

页数：12

共 62 条

[1] Cooperative Collision Avoidance for Nonholonomic Robots [J].

Alonso-Mora, Javier ;

Beardsley, Paul ;

Siegwart, Roland .

IEEE TRANSACTIONS ON ROBOTICS, 2018, 34 (02) :404-420

[2]

Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726

[3]

Barth A, 2008, IEEE INT VEH SYM, P510

[4]

Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380

[5]

Berner C., DOTA 2 LARGE SCALE D

[6] Model Predictive Contouring Control for Collision Avoidance in Unstructured Dynamic Environments [J].

Brito, Bruno ;

Floor, Boaz ;

Ferranti, Laura ;

Alonso-Mora, Javier .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04) :4459-4466

[7] Learning Functionally Decomposed Hierarchies for Continuous Control Tasks With Path Planning [J].

Christen, Sammy ;

Jendele, Lukas ;

Aksan, Emre ;

Hilliges, Otmar .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) :3623-3630

[8] A Mobile Robot that Understands Pedestrian Spatial Behaviors [J].

Chung, Shu-Yun ;

Huang, Han-Pang .

IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, :5861-5866

[9] Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning [J].

Devo, Alessandro ;

Mezzetti, Giacomo ;

Costante, Gabriele ;

Fravolini, Mario L. ;

Valigi, Paolo .

IEEE TRANSACTIONS ON ROBOTICS, 2020, 36 (05) :1546-1561

[10] Prediction of moving objects in dynamic environments using Kalman filters [J].

Elnager, A .

2001 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION: INTEGRATING INTELLIGENT MACHINES WITH HUMANS FOR A BETTER TOMORROW, 2001, :414-419

← 1 2 3 4 5 6 7 →