DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引:1
|
作者
Kim, Jaehoon [1 ]
Lee, Young Jae [1 ]
Kwak, Mingu [2 ]
Park, Young Joon [3 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA
[3] LG AI Res, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Simulation of sequential data: An enhanced reinforcement learning approach
    Vanhulsel, Marlies
    Janssens, Davy
    Wets, Geert
    Vanhoof, Koen
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 8032 - 8039
  • [32] STOCHASTIC KERNEL TEMPORAL DIFFERENCE FOR REINFORCEMENT LEARNING
    Bae, Jihye
    Giraldo, Luis Sanchez
    Chhatbar, Pratik
    Francis, Joseph
    Sanchez, Justin
    Principe, Jose
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [33] Reinforcement learning for sequential decision making in population research
    Deliu N.
    Quality & Quantity, 2024, 58 (6) : 5057 - 5080
  • [34] Reinforcement learning, Sequential Monte Carlo and the EM algorithm
    VIVEK S BORKAR
    ANKUSH V JAIN
    Sādhanā, 2018, 43
  • [35] A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning
    Tec, Mauricio
    Duan, Yunshan
    Muller, Peter
    AMERICAN STATISTICIAN, 2023, 77 (02) : 223 - 233
  • [36] Reinforcement learning, Sequential Monte Carlo and the EM algorithm
    Borkar, Vivek S.
    Jain, Ankush V.
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2018, 43 (08):
  • [37] Sequential Search with Off-Policy Reinforcement Learning
    Miao, Dadong
    Wang, Yanan
    Tang, Guoyu
    Liu, Lin
    Xu, Sulong
    Long, Bo
    Xiao, Yun
    Wu, Lingfei
    Jiang, Yunjiang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4006 - 4015
  • [38] Temporal Abstraction in Reinforcement Learning with the Successor Representation
    Machado, Marlos C.
    Barreto, Andre
    Precup, Doina
    Bowling, Michael
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [39] TEMPORAL LINK PREDICTION VIA REINFORCEMENT LEARNING
    Tao, Ye
    Li, Ying
    Wu, Zhonghai
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3470 - 3474
  • [40] A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
    Azimi, Fatemeh
    Raue, Federico
    Hees, Joern
    Dengel, Andreas
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 585 - 597