Effective Deep Reinforcement Learning Setups for Multiple Goals on Visual Navigation

被引：1

作者：

Takeshi Horita, Luiz Ricardo ^{[1
,2
]}

Wolf, Denis Fernando ^{[2
]}

Grassi Junior, Valdir ^{[2
]}

机构：

[1] Sidia Inst Sci & Technol, Sao Carlos, SP, Brazil

[2] Univ Sao Paulo, Sao Carlos, SP, Brazil

来源：

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年

基金：

巴西圣保罗研究基金会;

关键词：

reinforcement learning; goal-driven navigation; visual navigation;

D O I：

10.1109/ijcnn48605.2020.9206917

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep Reinforcement Learning (DRL) represents an interesting class of algorithms, since its objective is to learn a behavioral policy through interaction with the environment, leveraging the function approximation properties of neural networks. Nonetheless, for episodic problems, it is usually modeled to deal with a unique goal. In this sense, some works showed that it is possible to learn multiple goals when using a Universal Value Function Approximator (UVFA), i.e. a method to learn a universal policy by taking information about the current state of the agent and the goal. Their results are promising but show that there is still space for new contributions regarding the integration of the goal information into the model. For this reason, we propose using the Hadamard product or the Gated-Attention module in the UVFA architecture for visual-based problems. Also, we propose a hybrid exploration strategy based on the 6-greedy and the categorical probability distribution, namely 6-categorical. By systematically comparing different architectures of UVFA for different exploration strategies, and applying or not the Trust Region Policy Optimization (TRPO), we demonstrate through experiments that, for visual topologic navigation, combining visual information of the current and goal states through Hadamard product or Gated-Attention module allows the network learning near-optimal navigation policies. Also, we empirically show that the 6-categorical policy helps to avoid local minimums during the training, which facilitates the convergence to better results.

引用

页数：8

共 21 条

[1]

Andrychowicz M, 2017, HINDSIGHT EXPERIENCE, P5048

[2] Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning [J].

Arleo, A ;

Smeraldi, F ;

Gerstner, W .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (03) :639-652

[3] Clade-level Spatial Modelling of HPAI H5N1 Dynamics in the Mekong Region Reveals New Patterns and Associations with Agro-Ecological Factors [J].

Artois, Jean ;

Newman, Scott H. ;

Dhingra, Madhur S. ;

Chaiban, Celia ;

Linard, Catherine ;

Cattoli, Giovanni ;

Monne, Isabella ;

Fusaro, Alice ;

Xenarios, Ioannis ;

Engler, Robin ;

Liechti, Robin ;

Kuznetsov, Dmitri ;

Thanh Long Pham ;

Tung Nguyen ;

Van Dong Pham ;

Castellan, David ;

Von Dobschuetz, Sophie ;

Claes, Filip ;

Dauphin, Gwenaelle ;

Ken Inui ;

Gilbert, Marius .

SCIENTIFIC REPORTS, 2016, 6

[4]

Bruce D, 2018, S AFR CRIME Q, P7, DOI [10.17159/2413-3108/2018/v0n65a3049, 10.17159/2413-3108/2018/i65a3049]

[5]

Chaplot D. S., 2017, 32 AAAI C ART INT

[6]

Dosovitskiy Alexey, 2017, P 1 ANN C ROB LEARN, DOI DOI 10.48550/ARXIV.1711.03938

[7] Reinforcement Learning: A Tutorial Survey and Recent Advances [J].

Gosavi, Abhijit .

INFORMS JOURNAL ON COMPUTING, 2009, 21 (02) :178-192

[8] Reinforcement learning: A survey [J].

Kaelbling, LP ;

Littman, ML ;

Moore, AW .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285

[9]

KAELBLING LP, 1993, IJCAI-93, VOLS 1 AND 2, P1094

[10]

Lei Tai, 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), P31, DOI 10.1109/IROS.2017.8202134

← 1 2 3 →