DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引:1
|
作者
Kim, Jaehoon [1 ]
Lee, Young Jae [1 ]
Kwak, Mingu [2 ]
Park, Young Joon [3 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA
[3] LG AI Res, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Recovering Permuted Sequential Features for effective Reinforcement Learning
    Jiang, Yi
    Feng, Mingxiao
    Zhou, Wengang
    Li, Houqiang
    NEURAL NETWORKS, 2025, 182
  • [42] Reinforcement Learning Dynamics in Social Dilemmas
    Izquierdo, Segismundo S.
    Izquierdo, Luis R.
    Gotts, Nicholas M.
    JASSS-THE JOURNAL OF ARTIFICIAL SOCIETIES AND SOCIAL SIMULATION, 2008, 11 (02):
  • [43] Temporal difference learning applied to sequential detection
    Guo, CG
    Kuh, A
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (02): : 278 - 287
  • [44] Temporal and Agent Abstractions in Multiagent Reinforcement Learning
    Clement, Danielle M.
    Huber, Manfred
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2190 - 2195
  • [45] A modeling environment for reinforcement learning in games
    Gomes, Gilzamir
    Vidal, Creto A.
    Cavalcante-Neto, Joaquim B.
    Nogueira, Yuri L. B.
    ENTERTAINMENT COMPUTING, 2022, 43
  • [46] Reinforcement Learning Adversarial Swarm Dynamics
    Catherman, Davis S.
    Neville, Cory
    Bloom, Joshua
    White, Samuel S.
    IEEE SOUTHEASTCON 2020, 2020,
  • [47] A review on modeling tumor dynamics and agent reward functions in reinforcement learning based therapy optimization
    Almasy, Marton Gyorgy
    Horompo, Andras
    Kiss, Daniel
    Kertesz, Gabor
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (06) : 6939 - 6946
  • [48] Acceleration of Reinforcement Learning with Incomplete Prior Information
    Terashima, Kento
    Takano, Hirotaka
    Murata, Junichi
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2013, 17 (05) : 721 - 730
  • [49] Reinforcement learning algorithm based on information entropy
    Zhao Y.
    Chen Q.-W.
    Hu W.-L.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2010, 32 (05): : 1043 - 1046
  • [50] Exploration With Task Information for Meta Reinforcement Learning
    Jiang, Peng
    Song, Shiji
    Huang, Gao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4033 - 4046