GAN-based Intrinsic Exploration for Sample Efficient Reinforcement Learning

被引:0
作者
Kamar, Dogay [1 ]
Ure, Nazim Kemal [1 ,2 ]
Unal, Gozde [1 ,2 ]
机构
[1] Istanbul Tech Univ, Fac Comp & Informat, Istanbul, Turkey
[2] Istanbul Tech Univ, Artificial Intelligence & Data Sci Res Ctr, Istanbul, Turkey
来源
ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2022年
关键词
Deep Learning; Reinforcement Learning; Generative Adversarial Networks; Efficient Exploration in Reinforcement Learning;
D O I
10.5220/0010825500003116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we address the problem of efficient exploration in reinforcement learning. Most common exploration approaches depend on random action selection, however these approaches do not work well in environments with sparse or no rewards. We propose Generative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently. We discuss a few weaknesses and conclude by discussing future works.
引用
收藏
页码:264 / 272
页数:9
相关论文
共 26 条
  • [1] Badia A. P, 2020, ABS200206038 CORR
  • [2] Bellemare MG, 2016, ADV NEUR IN, V29
  • [3] Burda Y., 2019, 7 INT C LEARNING REP
  • [4] First return, then explore
    Ecoffet, Adrien
    Huizinga, Joost
    Lehman, Joel
    Stanley, Kenneth O.
    Clune, Jeff
    [J]. NATURE, 2021, 590 (7847) : 580 - 586
  • [5] Fox L., 2018, ABS180404012 ARXIV
  • [6] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [7] Guo Y., 2019, ABS190710247 ARXIV
  • [8] Hong W., 2019, P 1 INT C DISTR ART
  • [9] Houthooft R., 2016, ABS150203167 CORR
  • [10] Gulrajani I, 2017, ADV NEUR IN, V30