The Dreaming Variational Autoencoder for Reinforcement Learning Environments

被引:7
作者
Andersen, Per-Arne [1 ]
Goodwin, Morten [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Dept ICT, Grimstad, Norway
来源
ARTIFICIAL INTELLIGENCE XXXV (AI 2018) | 2018年 / 11311卷
关键词
Deep reinforcement learning; Environment modeling; Neural networks; Variational autoencoder; Markov decision processes; Exploration; Artificial experience-replay;
D O I
10.1007/978-3-030-04191-5_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.
引用
收藏
页码:143 / 155
页数:13
相关论文
共 22 条
  • [1] Towards a Deep Reinforcement Learning Approach for Tower Line Wars
    Andersen, Per-Arne
    Goodwin, Morten
    Granmo, Ole-Christoffer
    [J]. ARTIFICIAL INTELLIGENCE XXXIV, AI 2017, 2017, 10630 : 101 - 114
  • [2] [Anonymous], 2016, BETA VAE LEARNING BA
  • [3] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
  • [4] Bangaru S.P., 2016, ARXIV PREPRINT ARXIV
  • [5] Blundell C., 2016, ARXIV PREPRINT ARXIV
  • [6] Buesing L., 2018, ARXIV PREPRINT ARXIV
  • [7] Chen K., 2015, DEEP REINFORCEMENT L, P6
  • [8] Ha D., 2018, ARXIV PREPRINT ARXIV
  • [9] Higgins I, 2017, PR MACH LEARN RES, V70
  • [10] Reinforcement learning: A survey
    Kaelbling, LP
    Littman, ML
    Moore, AW
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 237 - 285