DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引：1

作者：

Kim, Jaehoon ^{[1
]}

Lee, Young Jae ^{[1
]}

Kwak, Mingu ^{[2
]}

Park, Young Joon ^{[3
]}

Kim, Seoung Bum ^{[1
]}

机构：

[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea

[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA

[3] LG AI Res, Seoul, South Korea

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

基金：

新加坡国家研究基金会;

关键词：

Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;

D O I：

10.1016/j.knosys.2024.112103

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.

引用

页数：12

共 50 条

[31] Simulation of sequential data: An enhanced reinforcement learning approach
Vanhulsel, Marlies
Janssens, Davy
Wets, Geert
Vanhoof, Koen
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 8032 - 8039
[32] STOCHASTIC KERNEL TEMPORAL DIFFERENCE FOR REINFORCEMENT LEARNING
Bae, Jihye
Giraldo, Luis Sanchez
Chhatbar, Pratik
Francis, Joseph
Sanchez, Justin
Principe, Jose
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[33] Reinforcement learning for sequential decision making in population research
Deliu N.
Quality & Quantity, 2024, 58 (6) : 5057 - 5080
[34] Reinforcement learning, Sequential Monte Carlo and the EM algorithm
VIVEK S BORKAR
ANKUSH V JAIN
Sādhanā, 2018, 43
[35] A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning
Tec, Mauricio
Duan, Yunshan
Muller, Peter
AMERICAN STATISTICIAN, 2023, 77 (02) : 223 - 233
[36] Reinforcement learning, Sequential Monte Carlo and the EM algorithm
Borkar, Vivek S.
Jain, Ankush V.
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2018, 43 (08):
[37] Sequential Search with Off-Policy Reinforcement Learning
Miao, Dadong
Wang, Yanan
Tang, Guoyu
Liu, Lin
Xu, Sulong
Long, Bo
Xiao, Yun
Wu, Lingfei
Jiang, Yunjiang
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4006 - 4015
[38] Temporal Abstraction in Reinforcement Learning with the Successor Representation
Machado, Marlos C.
Barreto, Andre
Precup, Doina
Bowling, Michael
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[39] TEMPORAL LINK PREDICTION VIA REINFORCEMENT LEARNING
Tao, Ye
Li, Ying
Wu, Zhonghai
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3470 - 3474
[40] A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
Azimi, Fatemeh
Raue, Federico
Hees, Joern
Dengel, Andreas
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 585 - 597

← 1 2 3 4 5 →