Vision-based Navigation Using Deep Reinforcement Learning

被引:43
作者
Kulhanek, Jonas [1 ]
Derner, Erik [2 ,3 ]
de Bruin, Tim [1 ]
Babuska, Robert [1 ,2 ]
机构
[1] Delft Univ Technol, Cognit Robot, Fac 3mE, NL-2628 CD Delft, Netherlands
[2] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague 16636, Czech Republic
[3] Czech Tech Univ, Fac Elect Engn, Dept Control Engn, Prague 16627, Czech Republic
来源
2019 EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR) | 2019年
关键词
Robot navigation; deep reinforcement learning; actor-critic; auxiliary tasks;
D O I
10.1109/ecmr.2019.8870964
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic environments is a challenging task. We propose a novel learning architecture capable of navigating an agent, e.g. a mobile robot, to a target given by an image. To achieve this, we have extended the batched A2C algorithm with auxiliary tasks designed to improve visual navigation performance. We propose three additional auxiliary tasks: predicting the segmentation of the observation image and of the target image and predicting the depth-map. These tasks enable the use of supervised learning to pre-train a major part of the network and to reduce the number of training steps substantially. The training performance has been further improved by increasing the environment complexity gradually over time. An efficient neural network structure is proposed, which is capable of learning for multiple targets in multiple environments. Our method navigates in continuous state spaces and on the AI2-THOR environment simulator surpasses the performance of state-of-the-art goal-oriented visual navigation methods from the literature.
引用
收藏
页数:8
相关论文
共 23 条
[1]  
[Anonymous], 2016, International Conference on Machine Learning, DOI DOI 10.48550/ARXIV.1602.01783
[2]  
Beattie Charles, 2016, Deepmind lab
[3]  
Bruce J., 2017, One-shot reinforcement learning for robot navigation with interactive replay
[4]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[5]   A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J].
Grondman, Ivo ;
Busoniu, Lucian ;
Lopes, Gabriel A. D. ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1291-1307
[6]  
Gurvits L., 1994, PREPRINT
[7]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[8]  
Herrasti Alvaro, 2017, AI2-THOR: An Interactive 3D Environment for Visual AI
[9]  
Jaderberg M., 2016, REINFORCEMENT LEARNI
[10]  
Lange S., 2010, 2010 INT JOINT C NEU, P1, DOI DOI 10.1109/IJCNN.2010.5596468