DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引:1
|
作者
Kim, Jaehoon [1 ]
Lee, Young Jae [1 ]
Kwak, Mingu [2 ]
Park, Young Joon [3 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA
[3] LG AI Res, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Distributed reinforcement learning for sequential decision making
    Rogova, G
    Scott, P
    Lolett, C
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOL II, 2002, : 1263 - 1268
  • [22] Learning of deterministic exploration and temporal abstraction in reinforcement learning
    Shibata, Katsunari
    2006 SICE-ICASE International Joint Conference, Vols 1-13, 2006, : 2212 - 2217
  • [23] Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information
    Jiang H.
    Ma J.-J.
    Yao H.-G.
    Cheng S.-Y.
    Chen Y.
    Yu J.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (11): : 3176 - 3186
  • [24] Predicting Citywide Passenger Demand via Reinforcement Learning from Spatio-Temporal Dynamics
    Ning, Xiaodong
    Yao, Lina
    Wang, Xianzhi
    Benatallah, Boualem
    Salim, Flora
    Haghighi, Pari Delir
    PROCEEDINGS OF THE 15TH EAI INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES (MOBIQUITOUS 2018), 2018, : 19 - 28
  • [25] Reinforcement Learning with Side Information for the Uncertainties
    Yang, Janghoon
    SENSORS, 2022, 22 (24)
  • [26] ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING
    Tampubolon, Ezra
    Ceribasi, Haris
    Boche, Holger
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4955 - 4959
  • [27] Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation
    Xi, Xumei
    Zhao, Yuke
    Liu, Quan
    Ouyang, Liwen
    Wu, Yang
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1103 - 1108
  • [28] Sequential Banner Design Optimization with Deep Reinforcement Learning
    Kondo, Yusuke
    Wang, Xueting
    Seshime, Hiroyuki
    Yamasaki, Toshihiko
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 253 - 256
  • [29] Predictive Movements and Human Reinforcement Learning of Sequential Action
    de Kleijn, Roy
    Kachergis, George
    Hommel, Bernhard
    COGNITIVE SCIENCE, 2018, 42 : 783 - 808
  • [30] Reinforcement learning and design of nonparametric sequential decision networks
    Ertin, E
    Priddy, KL
    APPLICATIONS AND SCIENCE OF COMPUTATIONAL INTELLIGENCE V, 2002, 4739 : 40 - 47