Efficient Hierarchical Reinforcement Learning for Mapless Navigation With Predictive Neighbouring Space Scoring

被引：1

作者：

Gao, Yan ^{[1
]}

Wu, Jing ^{[2
]}

Yang, Xintong ^{[1
]}

Ji, Ze ^{[1
]}

机构：

[1] Cardiff Univ, Sch Engn, Cardiff CF24 3AA, Wales

[2] Cardiff Univ, Sch Comp Sci Informat, Cardiff CF24 3AA, Wales

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2024年 / 21卷 / 04期

关键词：

Mapless navigation; deep reinforcement learn-ing; collision avoidance; motion planning; hierarchical reinforce-ment learning;

D O I：

10.1109/TASE.2023.3312237

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Solving reinforcement learning (RL)-based mapless navigation tasks is challenging due to their sparse reward and long decision horizon nature. Hierarchical reinforcement learning (HRL) has the ability to leverage knowledge at different abstract levels and is thus preferred in complex mapless navigation tasks. However, it is computationally expensive and inefficient to learn navigation end-to-end from raw high-dimensional sensor data, such as Lidar or RGB cameras. The use of subgoals based on a compact intermediate representation is therefore preferred for dimension reduction. This work proposes an efficient HRL-based framework to achieve this with a novel scoring method, named Predictive Neighbouring Space Scoring (PNSS). The PNSS model estimates the explorable space for a given position of interest based on the current robot observation. The PNSS values for a few candidate positions around the robot provide a compact and informative state representation for subgoal selection. We study the effects of different candidate position layouts and demonstrate that our layout design facilitates higher performances in longerrange tasks. Moreover, a penalty term is introduced in the reward function for the high-level (HL) policy, so that the subgoal selection process takes the performance of the lowlevel (LL) policy into consideration. Comprehensive evaluations demonstrate that using the proposed PNSS module consistently improves performances over the use of Lidar only or Lidar and encoded RGB features.

引用

页码：5457 / 5472

页数：16

共 56 条

[1] Andrychowicz M, 2017, ADV NEURAL INFORM PR, V30, P5048
[2] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[3] Bischoff B., 2013, P EUR S ART NEUR NET, P1
[4] Chan H., 2019, ARXIV
[5] Ding WH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), P237, DOI 10.1109/ROBIO.2018.8664803
[6] Deep reinforcement learning for map-less goal-driven robot navigation
Dobrevski, Matej
Skocaj, Danijel
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2021, 18 (01):
[7] Simultaneous localization and mapping: Part I
Durrant-Whyte, Hugh
Bailey, Tim
[J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2006, 13 (02) : 99 - 108
[8] From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. FRONTIERS IN ROBOTICS AND AI, 2019, 6
[9] The cognitive map in humans: spatial navigation and beyond
Epstein, Russell A.
Patai, Eva Zita
Julian, Joshua B.
Spiers, Hugo J.
[J]. NATURE NEUROSCIENCE, 2017, 20 (11) : 1504 - 1513
[10] Eysenbach B, 2019, ADV NEUR IN, V32

← 1 2 3 4 5 6 →