PAC-Bayesian offline Meta-reinforcement learning

被引:0
作者
Zheng Sun
Chenheng Jing
Shangqi Guo
Lingling An
机构
[1] Xidian University,Guangzhou Institute of Technology
[2] Tsinghua University,Department of Precision Instrument and Department of Automation
[3] Xidian University,School of Computer Science and Technology
来源
Applied Intelligence | 2023年 / 53卷
关键词
Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;
D O I
暂无
中图分类号
学科分类号
摘要
Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.
引用
收藏
页码:27128 / 27147
页数:19
相关论文
共 50 条
[31]   Estimating Disentangled Belief about Hidden State and Hidden Task for Meta-Reinforcement Learning [J].
Akuzawa, Kei ;
Iwasawa, Yusuke ;
Matsuo, Yutaka .
LEARNING FOR DYNAMICS AND CONTROL, VOL 144, 2021, 144
[32]   Meta-reinforcement learning for adaptive spacecraft guidance during finite-thrust rendezvous missions [J].
Federici, Lorenzo ;
Scorsoglio, Andrea ;
Zavoli, Alessandro ;
Furfaro, Roberto .
ACTA ASTRONAUTICA, 2022, 201 :129-141
[33]   Meta-ATMoS plus : A Meta-Reinforcement Learning Framework for Threat Mitigation in Software-Defined Networks [J].
Tsang, Hauton ;
Salahuddin, Mohammad A. ;
Limam, Noura ;
Boutaba, Raouf .
2023 IEEE 48TH CONFERENCE ON LOCAL COMPUTER NETWORKS, LCN 2023, 2023,
[34]   Knowledge transfer for adaptive maintenance policy optimization in engineering fleets based on meta-reinforcement learning [J].
Cheng, Jianda ;
Cheng, Minghui ;
Liu, Yan ;
Wu, Jun ;
Li, Wei ;
Frangopol, Dan M. .
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 247
[35]   Adaptive process parameters decision-making in robotic grinding based on meta-reinforcement learning [J].
Pan, Jie ;
Chen, Fan ;
Han, Dan ;
Ke, Shuai ;
Wei, Zhiao ;
Ding, Han .
JOURNAL OF MANUFACTURING PROCESSES, 2025, 137 :376-396
[36]   Caching for Doubly Selective Fading Channels via Model-Agnostic Meta-Reinforcement Learning [J].
He, Weibao ;
Zhou, Fasheng ;
Tang, Dong ;
Fang, Fang ;
Chen, Wei .
IEEE SYSTEMS JOURNAL, 2024, 18 (03) :1776-1785
[37]   Efficient Handling of Data Imbalance in Health Insurance Fraud Detection Using Meta-Reinforcement Learning [J].
Seshagiri, Supriya ;
Prema, K. V. .
IEEE ACCESS, 2025, 13 :23482-23497
[38]   Robust Meta-Reinforcement Learning for Autonomous Spacecraft Rendezvous with Transformer Networks Under Delayed Observations [J].
Mohseni, Mehrdad ;
Mohammadzaman, Iman .
INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2025,
[39]   On-orbit Service Mission Fast Autonomous Planning Method Based on Meta-reinforcement Learning [J].
Hu, Xinyue ;
Wang, Bin ;
Xiong, Xin ;
Sun, Zeyi ;
Feng, Zhe ;
Jin, Huaiping .
2024 43RD CHINESE CONTROL CONFERENCE, CCC 2024, 2024, :2195-2200
[40]   Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems [J].
Munir, Md Shirajum ;
Tran, Nguyen H. ;
Saad, Walid ;
Hong, Choong Seon .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (03) :3353-3374