PAC-Bayesian offline Meta-reinforcement learning

被引:0
作者
Zheng Sun
Chenheng Jing
Shangqi Guo
Lingling An
机构
[1] Xidian University,Guangzhou Institute of Technology
[2] Tsinghua University,Department of Precision Instrument and Department of Automation
[3] Xidian University,School of Computer Science and Technology
来源
Applied Intelligence | 2023年 / 53卷
关键词
Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;
D O I
暂无
中图分类号
学科分类号
摘要
Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.
引用
收藏
页码:27128 / 27147
页数:19
相关论文
共 50 条
[41]   A Meta-Reinforcement Learning Framework Using Deep Q-Networks and GCNs for Graph Cluster Representation [J].
Mughal, Fahad Razaque ;
He, Jingsha ;
Hussain, Saqib ;
Zhu, Nafei ;
Lakhan, Abdullah Raza ;
Khokhar, Muhammad Saddam .
SOFTWARE-PRACTICE & EXPERIENCE, 2025, 55 (08) :1320-1336
[42]   Context-Enhanced Meta-Reinforcement Learning with Data-Reused Adaptation for Urban Autonomous Driving [J].
Deng, Qi ;
Zhao, Yaqian ;
Li, Rengang ;
Hu, Qifu ;
Liu, Tiejun ;
Li, Ruyang .
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[43]   Adaptive meta-reinforcement learning for AUVs 3D guidance and control under unknown ocean currents [J].
Jiang, Yu ;
Zhang, Kaixin ;
Zhao, Minghao ;
Qin, Hongde .
OCEAN ENGINEERING, 2024, 309
[44]   Meta-Reinforcement Learning in Time-Varying UAV Communications: Adaptive Anti-Jamming Channel Selection [J].
Hu, Linzi ;
Shao, Yumeng ;
Qian, Yuwen ;
Du, Feng ;
Li, Jun ;
Lin, Yan ;
Wang, Zhe .
RADIOENGINEERING, 2024, 33 (03) :417-431
[45]   Meta-reinforcement learning-based task offloading method for UAV-enabled mobile edge computing [J].
Liu, Yanpei ;
He, Yanqiang ;
Zhao, Haoyang ;
Zhu, Liang ;
Li, Zhigang ;
Han, Chuang .
JOURNAL OF SUPERCOMPUTING, 2025, 81 (08)
[46]   Cooperative service caching and peer offloading in Internet of vehicles based on multi-agent meta-reinforcement learning [J].
Ning Z. ;
Zhang K. ;
Wang X. ;
Guo L. .
Tongxin Xuebao/Journal on Communications, 2021, 42 (06) :118-130
[47]   Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets [J].
Dupuis, Benjamin ;
Viallard, Paul ;
Deligiannidis, George ;
Simsekli, Umut .
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[48]   Meta-reinforcement learning based few-shot speech reconstruction for non-intrusive speech quality assessment [J].
Zhou, Weili ;
Lai, Jinxiong ;
Liao, Yuetao ;
Ji, Ruijie .
APPLIED INTELLIGENCE, 2023, 53 (11) :14146-14161
[49]   Actor-critic architecture based probabilistic meta-reinforcement learning for load balancing of controllers in software defined networks [J].
Ashish Sharma ;
Sanjiv Tokekar ;
Sunita Varma .
Automated Software Engineering, 2022, 29
[50]   Toward Adaptive and Coordinated Transportation Systems: A Multi-Personality Multi-Agent Meta-Reinforcement Learning Framework [J].
Huang, Songjun ;
Sun, Chuanneng ;
Wang, Ruo-Qian ;
Pompili, Dario .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,