PAC-Bayesian offline Meta-reinforcement learning

被引:0
作者
Zheng Sun
Chenheng Jing
Shangqi Guo
Lingling An
机构
[1] Xidian University,Guangzhou Institute of Technology
[2] Tsinghua University,Department of Precision Instrument and Department of Automation
[3] Xidian University,School of Computer Science and Technology
来源
Applied Intelligence | 2023年 / 53卷
关键词
Meta-reinforcement learning; PAC-bayesian theory; Dependency graph; Generalization bounds;
D O I
暂无
中图分类号
学科分类号
摘要
Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them, called PAC-Bayesian Offline Meta-Actor-Critic (PBOMAC). The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.
引用
收藏
页码:27128 / 27147
页数:19
相关论文
共 50 条
[11]   Wireless Power Control via Meta-Reinforcement Learning [J].
Lu, Ziyang ;
Gursoy, M. Cenk .
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, :1562-1567
[12]   Erratum: Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm [J].
Vignault, Louis-Philippe ;
Durand, Audrey ;
Germain, Pascal .
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[13]   On Task-Relevant Loss Functions in Meta-Reinforcement Learning [J].
Shin, Jaeuk ;
Kim, Giho ;
Lee, Howon ;
Han, Joonho ;
Yang, Insoon .
6TH ANNUAL LEARNING FOR DYNAMICS & CONTROL CONFERENCE, 2024, 242 :1174-1186
[14]   Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference [J].
Chen, Jinhao ;
Zhang, Chunhong ;
Hu, Zheng .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024, 2024, 14647 :223-234
[15]   Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments [J].
Bing, Zhenshan ;
Lerch, David ;
Huang, Kai ;
Knoll, Alois .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3476-3491
[16]   Image quality assessment for machine learning tasks using meta-reinforcement learning [J].
Saeed S.U. ;
Fu Y. ;
Stavrinides V. ;
Baum Z.M.C. ;
Yang Q. ;
Rusu M. ;
Fan R.E. ;
Sonn G.A. ;
Noble J.A. ;
Barratt D.C. ;
Hu Y. .
Medical Image Analysis, 2022, 78
[17]   Human-Inspired Meta-Reinforcement Learning Using Bayesian Knowledge and Enhanced Deep Q-Network [J].
Ho, Joshua ;
Wang, Chien-Min ;
King, Chung-Ta ;
You, Yi-Hsin ;
Feng, Chi-Wei .
INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2024, 18 (04) :547-569
[18]   Meta-Reinforcement Learning for Centralized Multiple Intersections Traffic Signal Control [J].
Ren, Yanyu ;
Wu, Jia ;
Yi, Chenglin ;
Ran, Yunchuan ;
Lou, Yican .
2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, :281-286
[19]   Meta-reinforcement learning-based collision avoidance for autonomous ship [J].
Jia, Xinyu ;
Gao, Shu ;
He, Wei .
OCEAN ENGINEERING, 2025, 339
[20]   Learning and Fast Adaptation for Air Combat Decision with Improved Deep Meta-reinforcement Learning [J].
Zhang, Pin ;
Dong, Wenhan ;
Cai, Ming ;
Li, Dunwang ;
Zhang, Xin .
INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2025, 26 (04) :1692-1707