Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

被引:0
作者
Chen, Jinhao [1 ]
Zhang, Chunhong [2 ]
Hu, Zheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China
[2] Beijing Univ Posts & Telecommun, Key Lab Universal Wireless Commun, Minist Educ, Beijing 100088, Peoples R China
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷
关键词
Meta-Reinforcement Learning; Variational Inference; Hidden Feature;
D O I
10.1007/978-981-97-2259-4_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meta-Reinforcement Learning aims to rapidly address unseen tasks that share similar structures. However, the agent heavily relies on a large amount of experience during the meta-training phase, presenting a formidable challenge in achieving high sample efficiency. Current methods typically adapt to novel tasks within the Meta-Reinforcement Learning framework through task inference. Unfortunately, these approaches still exhibit limitations when faced with highcomplexity task space. In this paper, we propose a Meta-Reinforcement Learning method based on reward and dynamic inference. We introduce independent reward and dynamic inference encoders, which sample specific context information to capture the deep-level features of task goals and dynamics. By reducing task inference space, agent effectively learns the shared structures across tasks and acquires a profound understanding of the task differences. We illustrate the performance degradation caused by the high task inference complexity and demonstrate that our method outperforms previous algorithms in terms of sample efficiency.
引用
收藏
页码:223 / 234
页数:12
相关论文
共 33 条
[1]  
Alemi AA, 2019, Arxiv, DOI [arXiv:1612.00410, DOI 10.48550/ARXIV.1612.00410]
[2]  
Beck J, 2024, Arxiv, DOI [arXiv:2301.08028, 10.48550/ARXIV.2301.08028]
[3]  
Beck Jacob, 2022, arXiv, DOI DOI 10.48550/ARXIV.2210.11348
[4]   Autonomous navigation of stratospheric balloons using reinforcement learning [J].
Bellemare, Marc G. ;
Candido, Salvatore ;
Castro, Pablo Samuel ;
Gong, Jun ;
Machado, Marlos C. ;
Moitra, Subhodeep ;
Ponda, Sameera S. ;
Wang, Ziyu .
NATURE, 2020, 588 (7836) :77-+
[5]  
Stadie BC, 2019, Arxiv, DOI arXiv:1803.01118
[6]  
Duan Y, 2016, Arxiv, DOI [arXiv:1611.02779, 10.48550/arXiv.1611.02779]
[7]  
Finn C., 2017, Model-agnostic meta-learning for fast adaptation of deep networks, P10
[8]  
Greenberg I, 2023, Arxiv, DOI [arXiv:2301.11147, 10.48550/arXiv.2301.11147]
[9]  
Gupta Abhishek, 2018, Advances in Neural Information Processing Systems, V31
[10]  
Haarnoja T, 2018, Arxiv, DOI [arXiv:1801.01290, DOI 10.48550/ARXIV.1801.01290]