PAC-Bayesian offline Meta-reinforcement learning

被引:0
作者
Zheng Sun
Chenheng Jing
Shangqi Guo
Lingling An
机构
[1] Xidian University,Guangzhou Institute of Technology
[2] Tsinghua University,Department of Precision Instrument and Department of Automation
[3] Xidian University,School of Computer Science and Technology
来源
Applied Intelligence | 2023年 / 53卷
关键词
Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;
D O I
暂无
中图分类号
学科分类号
摘要
Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.
引用
收藏
页码:27128 / 27147
页数:19
相关论文
共 50 条
[21]   A Meta-Reinforcement Learning-Based Poisoning Attack Framework Against Federated Learning [J].
Zhou, Wei ;
Zhang, Donglai ;
Wang, Hongjie ;
Li, Jinliang ;
Jiang, Mingjian .
IEEE ACCESS, 2025, 13 :28628-28644
[22]   MetaABR: Environment-Adaptive Video Streaming System with Meta-Reinforcement Learning [J].
Choi, Wangyu ;
Yoon, Jongwon .
PROCEEDINGS OF THE INTERNATIONAL CONEXT STUDENT WORKSHOP 2022, CONEXT-SW 2022, 2022, :37-39
[23]   Adaptable Image Quality Assessment Using Meta-Reinforcement Learning of Task Amenability [J].
Saeed, Shaheer U. ;
Fu, Yunguan ;
Stavrinides, Vasilis ;
Baum, Zachary M. C. ;
Yang, Qianye ;
Rusu, Mirabela ;
Fan, Richard E. ;
Sonn, Geoffrey A. ;
Noble, J. Alison ;
Barratt, Dean C. ;
Hu, Yipeng .
SIMPLIFYING MEDICAL ULTRASOUND, 2021, 12967 :191-201
[24]   Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Meta-Reinforcement Learning [J].
Jiang W. ;
Wu J. ;
Wang Y. .
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2022, 49 (06) :101-109
[25]   Meta-Reinforcement Learning-Based Transferable Scheduling Strategy for Energy Management [J].
Xiong, Luolin ;
Tang, Yang ;
Liu, Chensheng ;
Mao, Shuai ;
Meng, Ke ;
Dong, Zhaoyang ;
Qian, Feng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (04) :1685-1695
[26]   UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation Learning [J].
Jiang, Shui ;
Ge, Yanning ;
Yang, Xu ;
Yang, Wencheng ;
Cui, Hui .
FUTURE INTERNET, 2024, 16 (03)
[27]   Learning Ad Hoc Cooperation Policies from Limited Priors via Meta-Reinforcement Learning [J].
Fang, Qi ;
Zeng, Junjie ;
Xu, Haotian ;
Hu, Yue ;
Yin, Quanjun .
APPLIED SCIENCES-BASEL, 2024, 14 (08)
[28]   AutoMTNAS: Automated meta-reinforcement learning on graph tokenization for graph neural architecture search [J].
Nie, Mingshuo ;
Chen, Dongming ;
Chen, Huilin ;
Wang, Dongqi .
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[29]   Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning [J].
Federici, Lorenzo ;
Zavoli, Alessandro .
ACTA ASTRONAUTICA, 2024, 214 :147-158
[30]   Meta-Reinforcement Learning via Buffering Graph Signatures for Live Video Streaming Events [J].
Antaris, Stefanos ;
Rafailidis, Dimitrios ;
Girdzijauskas, Sarunas .
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, :385-392