Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning

被引：26

作者：

Xiao, Wendong ^{[1
]}

Yuan, Liang ^{[1
,2
,3
]}

He, Li ^{[1
]}

Ran, Teng ^{[1
]}

Zhang, Jianbo ^{[1
]}

Cui, Jianping ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Mech Engn, Urumqi 830046, Peoples R China

[2] Beijing Univ Chem Technol, Beijing Adv Innovat Ctr Soft Matter Sci & Engn, Beijing 100029, Peoples R China

[3] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

基金：

中国国家自然科学基金;

关键词：

Navigation; Visualization; Task analysis; Trajectory; Collision avoidance; Reinforcement learning; Training; deep reinforcement learning (DRL); multigoal navigation; visual sensor;

D O I：

10.1109/TIM.2022.3158384

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Learning to map the images acquired by a moving agent equipped with a camera sensor to motion commands for multigoal navigation is challenging. Most existing approaches are still struggling against collision avoidance, faster convergence, and generalization. In this article, a novel actor-critic architecture is presented to learn the optimal navigation policy. We introduce single-step reward observation and collision penalty to reshape the reinforcement learning (RL) reward function. The collision perception can be obtained by the reshaped reward function and treated as measurement information from the visual observation to avoid obstacles. Besides, expert trajectories are used to generate subgoals. A subgoal reward shaping is then proposed to accelerate policy learning with the expert knowledge of subgoals. In order to generate human-aware navigation policies, an observation-action consistency (OAC) model is introduced to ensure that the agent reaches the subgoals in turn, and moves toward the target. The whole training process is performed on a self-supervised RL approach, accompanied by an expert supervision signal. This method balances the exploration and exploitation, helping the proposed model to generalize to unseen goals. The training experiments on AI2-THOR show better performance and faster convergence speed, compared with the existing approaches. For the generalization capacity to unseen goals, the proposed method achieves the state-of-the-art success rate, with at least a 30% improvement of average episode collision.

引用

页数：9

共 41 条

[1]

Andreas J, 2017, PR MACH LEARN RES, V70

[2]

Andrychowicz M., 2017, P ADV NEUR INF PROC

[3]

[Anonymous], 2018, ARXIV180510209

[4]

Crook P. A., 2007, P AAMAS WORKSH AD LE, P1

[5] Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning [J].

Devo, Alessandro ;

Mezzetti, Giacomo ;

Costante, Gabriele ;

Fravolini, Mario L. ;

Valigi, Paolo .

IEEE TRANSACTIONS ON ROBOTICS, 2020, 36 (05) :1546-1561

[6]

Duan Y, 2017, ADV NEUR IN, V30

[7] Integration of Cell-Mapping and Reinforcement-Learning Techniques for Motion Planning of Car-Like Robots [J].

Gomez Plaza, Mariano ;

Martinez-Marin, Tomas ;

Sanchez Prieto, Sebastian ;

Meziat Luna, Daniel .

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2009, 58 (09) :3094-3103

[8] Collision Anticipation via Deep Reinforcement Learning for Visual Navigation [J].

Gutierrez-Maestro, Eduardo ;

Lopez-Sastre, Roberto J. ;

Maldonado-Bascon, Saturnino .

PATTERN RECOGNITION AND IMAGE ANALYSIS, PT I, 2020, 11867 :386-397

[9] Imitation Learning: A Survey of Learning Methods [J].

Hussein, Ahmed ;

Gaber, Mohamed Medhat ;

Elyan, Eyad ;

Jayne, Chrisina .

ACM COMPUTING SURVEYS, 2017, 50 (02)

[10]

Jaritz M, 2018, IEEE INT CONF ROBOT, P2070

← 1 2 3 4 5 →