PAC-Bayesian offline Meta-reinforcement learning

被引：0

作者：

Zheng Sun

Chenheng Jing

Shangqi Guo

Lingling An

机构：

[1] Xidian University,Guangzhou Institute of Technology

[2] Tsinghua University,Department of Precision Instrument and Department of Automation

[3] Xidian University,School of Computer Science and Technology

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.

引用

页码：27128 / 27147

页数：19

共 50 条

[21] A Meta-Reinforcement Learning-Based Poisoning Attack Framework Against Federated Learning [J].

Zhou, Wei ;

Zhang, Donglai ;

Wang, Hongjie ;

Li, Jinliang ;

Jiang, Mingjian .

IEEE ACCESS, 2025, 13 :28628-28644

[22] MetaABR: Environment-Adaptive Video Streaming System with Meta-Reinforcement Learning [J].

Choi, Wangyu ;

Yoon, Jongwon .

PROCEEDINGS OF THE INTERNATIONAL CONEXT STUDENT WORKSHOP 2022, CONEXT-SW 2022, 2022, :37-39

[23] Adaptable Image Quality Assessment Using Meta-Reinforcement Learning of Task Amenability [J].

Saeed, Shaheer U. ;

Fu, Yunguan ;

Stavrinides, Vasilis ;

Baum, Zachary M. C. ;

Yang, Qianye ;

Rusu, Mirabela ;

Fan, Richard E. ;

Sonn, Geoffrey A. ;

Noble, J. Alison ;

Barratt, Dean C. ;

Hu, Yipeng .

SIMPLIFYING MEDICAL ULTRASOUND, 2021, 12967 :191-201

[24] Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Meta-Reinforcement Learning [J].

Jiang W. ;

Wu J. ;

Wang Y. .

Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2022, 49 (06) :101-109

[25] Meta-Reinforcement Learning-Based Transferable Scheduling Strategy for Energy Management [J].

Xiong, Luolin ;

Tang, Yang ;

Liu, Chensheng ;

Mao, Shuai ;

Meng, Ke ;

Dong, Zhaoyang ;

Qian, Feng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (04) :1685-1695

[26] UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation Learning [J].

Jiang, Shui ;

Ge, Yanning ;

Yang, Xu ;

Yang, Wencheng ;

Cui, Hui .

FUTURE INTERNET, 2024, 16 (03)

[27] Learning Ad Hoc Cooperation Policies from Limited Priors via Meta-Reinforcement Learning [J].

Fang, Qi ;

Zeng, Junjie ;

Xu, Haotian ;

Hu, Yue ;

Yin, Quanjun .

APPLIED SCIENCES-BASEL, 2024, 14 (08)

[28] AutoMTNAS: Automated meta-reinforcement learning on graph tokenization for graph neural architecture search [J].

Nie, Mingshuo ;

Chen, Dongming ;

Chen, Huilin ;

Wang, Dongqi .

KNOWLEDGE-BASED SYSTEMS, 2025, 310

[29] Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning [J].

Federici, Lorenzo ;

Zavoli, Alessandro .

ACTA ASTRONAUTICA, 2024, 214 :147-158

[30] Meta-Reinforcement Learning via Buffering Graph Signatures for Live Video Streaming Events [J].

Antaris, Stefanos ;

Rafailidis, Dimitrios ;

Girdzijauskas, Sarunas .

PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, :385-392

← 1 2 3 4 5 →