Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning

被引：73

作者：

Devo, Alessandro ^{[1
]}

Mezzetti, Giacomo ^{[1
]}

Costante, Gabriele ^{[1
]}

Fravolini, Mario L. ^{[1
]}

Valigi, Paolo ^{[1
]}

机构：

[1] Univ Perugia, Dept Engn, I-06125 Perugia, Italy

来源：

IEEE TRANSACTIONS ON ROBOTICS | 2020年 / 36卷 / 05期

关键词：

Navigation; Visualization; Task analysis; Training; Machine learning; Simultaneous localization and mapping; Deep learning in robotics and automation; target-driven visual navigation; visual-based navigation; visual learning; SIMULTANEOUS LOCALIZATION; OBSTACLE AVOIDANCE;

D O I：

10.1109/TRO.2020.2994002

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Among the main challenges in robotics, target-driven visual navigation has gained increasing interest in recent years. In this task, an agent has to navigate in an environment to reach a user specified target, only through vision. Recent fruitful approaches rely on deep reinforcement learning, which has proven to be an effective framework to learn navigation policies. However, current state-of-the-art methods require to retrain, or at least fine-tune, the model for every new environment and object. In real scenarios, this operation can be extremely challenging or even dangerous. For these reasons, we address generalization in target-driven visual navigation by proposing a novel architecture composed of two networks, both exclusively trained in simulation. The first one has the objective of exploring the environment, while the other one of locating the target. They are specifically designed to work together, while separately trained to help generalization. In this article, we test our agent in both simulated and real scenarios, and validate its generalization capabilities through extensive experiments with previously unseen goals and unknown mazes, even much larger than the ones used for training.

引用

页码：1546 / 1561

页数：16

共 54 条

[1]

Anderson Peter, 2018, ArXiv

[2] Learning dexterous in-hand manipulation [J].

Andrychowicz, Marcin ;

Baker, Bowen ;

Chociej, Maciek ;

Jozefowicz, Rafal ;

McGrew, Bob ;

Pachocki, Jakub ;

Petron, Arthur ;

Plappert, Matthias ;

Powell, Glenn ;

Ray, Alex ;

Schneider, Jonas ;

Sidor, Szymon ;

Tobin, Josh ;

Welinder, Peter ;

Weng, Lilian ;

Zaremba, Wojciech .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20

[3]

[Anonymous], 2016, INT C LEARN REPR ICL

[4]

[Anonymous], 2017, MULTIDISCIPLINARY C

[5]

[Anonymous], 2013, PLAYING ATARI DEEP R

[6]

[Anonymous], 2018, P 2 C ROB LEARN

[7]

Babaeizadeh Mohammad, 2016, ABS161106256 CORR

[8] Learning to play chess using temporal differences [J].

Baxter, J ;

Tridgell, A ;

Weaver, L .

MACHINE LEARNING, 2000, 40 (03) :243-263

[9] REAL-TIME OBSTACLE AVOIDANCE FOR FAST MOBILE ROBOTS [J].

BORENSTEIN, J ;

KOREN, Y .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1989, 19 (05) :1179-1187

[10] THE VECTOR FIELD HISTOGRAM - FAST OBSTACLE AVOIDANCE FOR MOBILE ROBOTS [J].

BORENSTEIN, J ;

KOREN, Y .

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1991, 7 (03) :278-288

← 1 2 3 4 5 6 →