Model Based Reinforcement Learning Pre-Trained with Various State Data

被引：0

作者：

Ono, Masaaki ^{[1
]}

Ichise, Ryutaro ^{[2
]}

机构：

[1] Grad Univ Adv Studies, SOKENDAI, Natl Inst Informat, Tokyo, Japan

[2] Tokyo Inst Technol, Natl Inst Informat, Tokyo, Japan

来源：

2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024 | 2024年

关键词：

artificial intelligence; reinforcement learning; model-based reinforcement learning; neural networks; GO;

D O I：

10.1109/CAI59869.2024.00169

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) has shown remarkable capabilities in various domains, yet struggles in environments with sparse rewards. A significant challenge in such environments is the exploration depth and the robustness of performance. This paper introduces WODID framework, aiming to enhance exploration in Model-Based Reinforcement Learning (MBRL) without relying heavily on initial or early-stage trajectory data. We identify one primary issue of the transition model of MBRL: trained with random policy when forming the transition model, which hinders exploration and causes high dependency on the success of dataset collection by random policy. By pre-training world models using diverse state data, WODID improves the quality of the transition model, leading to deeper exploration and stabilizing its performance. Our empirical studies, particularly in the challenging sparse reward environment: Montezuma's Revenge, demonstrate that WODID outperforms the baseline methods, achieving more profound exploration with fewer environmental steps. Furthermore, our approach offers a human-free method to feed trajectory data, promoting less dependency on initial samples and paving the way for more robust and efficient RL agents.

引用

页码：918 / 925

页数：8

共 47 条

[1] Amodei Dario, 2016, Computing Research Repository
[2] [Anonymous], 2014, P 2014 C EMP METH NA, DOI DOI 10.48550/ARXIV.1406.1078
[3] Baker B, 2022, ADV NEUR IN
[4] Barto A. G., 2013, Intrinsic Motivation and Reinforcement Learning, P17, DOI [10.1007/978-3-642-32375-1\\_2, DOI 10.1007/978-3-642-32375-1_2]
[5] Bellemare MG, 2016, ADV NEUR IN, V29
[6] The Arcade Learning Environment: An Evaluation Platform for General Agents
Bellemare, Marc G.
Naddaf, Yavar
Veness, Joel
Bowling, Michael
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 : 253 - 279
[7] Exploration via Progress-Driven Intrinsic Rewards
Bougie, Nicolas
Ichise, Ryutaro
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 269 - 281
[8] Buesing L., 2018, Computing Research Repository
[9] Burda Y., 2019, INT C LEARNING REPRE, P1
[10] Cao Xin, 2021, P INT C NEUR INF PRO, P405

← 1 2 3 4 5 →