Variable-Agnostic Causal Exploration for Reinforcement Learning

被引:0
作者
Minh Hoang Nguyen [1 ]
Le, Hung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Appl Artificial Intelligence Inst A2I2, Geelong, Vic, Australia
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT II, ECML PKDD 2024 | 2024年 / 14942卷
关键词
Reinforcement Learning; Causality; Deep RL;
D O I
10.1007/978-3-031-70344-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL), incorporating causal relationships to drive exploration in RL without specifying environmental causal variables. Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. Subsequently, it constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion. This can be leveraged to generate intrinsic rewards or establish a hierarchy of subgoals to enhance exploration efficiency. Experimental results showcase a significant improvement in agent performance in grid-world, 2d games and robotic domains, particularly in scenarios with sparse rewards and noisy actions, such as the notorious Noisy-TV environments.
引用
收藏
页码:216 / 232
页数:17
相关论文
共 32 条
[1]  
Andrychowicz Marcin., 2017, Advances in Neural Information Processing Systems, P5048
[2]  
Bellemare MG, 2016, ADV NEUR IN, V29
[3]  
Burda Y, 2018, Arxiv, DOI [arXiv:1810.12894, 10.48550/arXiv.1810.12894]
[4]  
Chevalier-Boisvert M, 2023, Arxiv, DOI arXiv:2306.13831
[5]  
Corcoll O, 2022, Arxiv, DOI arXiv:2010.01351
[6]  
de Haan P, 2019, ADV NEUR IN, V32
[7]  
de Lazcano R., 2023, Gymnasium robotics
[8]  
Ding Wenhao, 2022, Advances in Neural Information Processing Systems
[9]  
Hu X., 2022, Adv. Neural Inf. Process. Syst, V35, P20064
[10]   Optimizing agent behavior over long time scales by transporting value [J].
Hung, Chia-Chun ;
Lillicrap, Timothy ;
Abramson, Josh ;
Wu, Yan ;
Mirza, Mehdi ;
Carnevale, Federico ;
Ahuja, Arun ;
Wayne, Greg .
NATURE COMMUNICATIONS, 2019, 10 (1)