Fast and slow curiosity for high-level exploration in reinforcement learning

被引：0

作者：

Nicolas Bougie

Ryutaro Ichise

机构：

[1] National Institute of Informatics,

[2] The Graduate University for Advanced Studies,undefined

[3] Sokendai,undefined

来源：

Applied Intelligence | 2021年 / 51卷

关键词：

Reinforcement learning; Exploration; Autonomous exploration; Curiosity in reinforcement learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep reinforcement learning (DRL) algorithms rely on carefully designed environment rewards that are extrinsic to the agent. However, in many real-world scenarios rewards are sparse or delayed, motivating the need for discovering efficient exploration strategies. While intrinsically motivated agents hold promise of better local exploration, solving problems that require coordinated decisions over long-time horizons remains an open problem. We postulate that to discover such strategies, a DRL agent should be able to combine local and high-level exploration behaviors. To this end, we introduce the concept of fast and slow curiosity that aims to incentivize long-time horizon exploration. Our method decomposes the curiosity bonus into a fast reward that deals with local exploration and a slow reward that encourages global exploration. We formulate this bonus as the error in an agent’s ability to reconstruct the observations given their contexts. We further propose to dynamically weight local and high-level strategies by measuring state diversity. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros, and Atari games. Experimental results show that our agent outperforms prior approaches in most tasks in terms of exploration efficiency and mean scores.

引用

页码：1086 / 1107

页数：21

共 44 条

[1]

Baranes A(2013)Active learning of inverse models with intrinsically motivated goal exploration in robots Robot. Auton. Syst. 61 49-73

[2]

Oudeyer PY(2013)The arcade learning environment: An evaluation platform for general agents J. Artif. Intell. Res. 47 253-279

[3]

Bellemare MG(2019)Skill-based curiosity for intrinsically motivated reinforcement learning Mach Learn 109 493-512

[4]

Naddaf Y(2006)Image denoising via sparse and redundant representations over learned dictionaries IEEE Trans. Image Process. 15 3736-3745

[5]

Veness J(2017)Neuroscience-inspired artificial intelligence Neuron 95 245-258

[6]

Bowling M(2019)Efficient deep learning of image denoising using patch complexity local divide and deep conquer Pattern Recogn. 96 106945-1331

[7]

Bougie N(2015)Human-level control through deep reinforcement learning Nature 518 529-44

[8]

Ichise R(2016)Mastering the game of go with deep neural networks and tree search Nature 529 484-612

[9]

Elad M(2008)An analysis of model-based interval estimation for Markov decision processes J. Comput. Syst. Sci. 74 1309-undefined

[10]

Aharon M(1988)Learning to predict by the methods of temporal differences Machine Learning 3 9-undefined

← 1 2 3 4 5 →