PAC-Bayesian offline Meta-reinforcement learning

被引：0

作者：

Zheng Sun

Chenheng Jing

Shangqi Guo

Lingling An

机构：

[1] Xidian University,Guangzhou Institute of Technology

[2] Tsinghua University,Department of Precision Instrument and Department of Automation

[3] Xidian University,School of Computer Science and Technology

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.

引用

页码：27128 / 27147

页数：19