Efficient Hierarchical Reinforcement Learning for Mapless Navigation With Predictive Neighbouring Space Scoring

被引：1

作者：

Gao, Yan ^{[1
]}

Wu, Jing ^{[2
]}

Yang, Xintong ^{[1
]}

Ji, Ze ^{[1
]}

机构：

[1] Cardiff Univ, Sch Engn, Cardiff CF24 3AA, Wales

[2] Cardiff Univ, Sch Comp Sci Informat, Cardiff CF24 3AA, Wales

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2024年 / 21卷 / 04期

关键词：

Mapless navigation; deep reinforcement learn-ing; collision avoidance; motion planning; hierarchical reinforce-ment learning;

D O I：

10.1109/TASE.2023.3312237

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Solving reinforcement learning (RL)-based mapless navigation tasks is challenging due to their sparse reward and long decision horizon nature. Hierarchical reinforcement learning (HRL) has the ability to leverage knowledge at different abstract levels and is thus preferred in complex mapless navigation tasks. However, it is computationally expensive and inefficient to learn navigation end-to-end from raw high-dimensional sensor data, such as Lidar or RGB cameras. The use of subgoals based on a compact intermediate representation is therefore preferred for dimension reduction. This work proposes an efficient HRL-based framework to achieve this with a novel scoring method, named Predictive Neighbouring Space Scoring (PNSS). The PNSS model estimates the explorable space for a given position of interest based on the current robot observation. The PNSS values for a few candidate positions around the robot provide a compact and informative state representation for subgoal selection. We study the effects of different candidate position layouts and demonstrate that our layout design facilitates higher performances in longerrange tasks. Moreover, a penalty term is introduced in the reward function for the high-level (HL) policy, so that the subgoal selection process takes the performance of the lowlevel (LL) policy into consideration. Comprehensive evaluations demonstrate that using the proposed PNSS module consistently improves performances over the use of Lidar only or Lidar and encoded RGB features.

引用

页码：5457 / 5472

页数：16

共 56 条

[31] Ramakrishnan Santhosh K., 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12350), P400, DOI 10.1007/978-3-030-58558-7_24
[32] U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Olaf
Fischer, Philipp
Brox, Thomas
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241
[33] On Embodied Visual Navigation in Real Environments Through Habitat
Rosano, Marco
Furnari, Antonino
Gulino, Luigi
Farinella, Giovanni Maria
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9740 - 9747
[34] Schaul T, 2015, PR MACH LEARN RES, V37, P1312
[35] Sharma A., 2019, ARXIV
[36] iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes
Shen, Bokui
Fei Xia
Li, Chengshu
Martin-Martin, Roberto
Fan, Linxi
Wang, Guanzhi
Perez-D'Arpino, Claudia
Buch, Shyamal
Srivastava, Sanjana
Tchapmi, Lyne
Tchapmi, Micael
Vainio, Kent
Wong, Josiah
Li Fei-Fei
Savarese, Silvio
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 7520 - 7527
[37] Srivastava R. K., 2019, ARXIV
[38] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[39] Heuristic approaches in robot path planning: A survey
Thi Thoa Mac
Copot, Cosmin
Duc Trung Tran
De Keyser, Robin
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 86 : 13 - 28
[40] van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094

← 1 2 3 4 5 6 →