Fast Task Adaptation Based on the Combination of Model-Based and Gradient-Based Meta Learning

被引：12

作者：

Xu, Zhixiong ^{[1
]}

Chen, Xiliang ^{[1
]}

Cao, Lei ^{[1
]}

机构：

[1] Army Engn Univ, Inst Command & Control Engn, Nanjing 210000, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2022年 / 52卷 / 06期

关键词：

Task analysis; Adaptation models; Reinforcement learning; Trajectory; Games; Data models; Training; Fast adaptation; gradient; metalearning; model-based; reinforcement learning;

D O I：

10.1109/TCYB.2020.3028378

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning (DRL) recently has attained remarkable results in various domains, including games, robotics, and recommender system. Nevertheless, an urgent problem in the practical application of DRL is fast adaptation. To this end, this article proposes a new and versatile metalearning approach called fast task adaptation via metalearning (FTAML), which leverages the strengths of the model-based methods and gradient-based metalearning methods for training the initial parameters of the model, such that the model is able to efficiently master unseen tasks with a little amount of data from the tasks. The proposed algorithm makes it possible to separate task optimization and task identification, specifically, the model-based learner helps to identify the pattern of a task, while the gradient-based metalearner is capable of consistently improving the performance with only a few gradient update steps through making use of the task embedding produced by the model-based learner. In addition, the choice of network for the model-based learner in the proposed method is also discussed, and the performance of networks with different depths is explored. Finally, the simulation results on reinforcement learning problems demonstrate that the proposed approach outperforms compared metalearning algorithms and delivers a new state-of-the-art performance on a variety of challenging control tasks.

引用

页码：5209 / 5218

页数：10

共 50 条

[21] Traffic Light Control with Policy Gradient-Based Reinforcement Learning
Tas, Mehmet Bilge Han
Ozkan, Kemal
Saricicek, Inci
Yazici, Ahmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[22] Asynchronous Methods for Model-Based Reinforcement Learning
Zhang, Yunzhi
Clavera, Ignasi
Tsai, Boren
Abbeel, Pieter
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[23] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
Liu, Xiyuan
Wu, Jia
Chen, Senpeng
SOFT COMPUTING, 2023, 27 (13) : 8661 - 8678
[24] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
Xiyuan Liu
Jia Wu
Senpeng Chen
Soft Computing, 2023, 27 : 8661 - 8678
[25] Model-Based Meta-Reinforcement Learning for Flight With Suspended Payloads
Belkhale, Suneel
Li, Rachel
Kahn, Gregory
McAllister, Rowan
Calandra, Roberto
Levine, Sergey
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 1471 - 1478
[26] Model-based Learning with Bayesian and MAXQ Value Function Decomposition for Hierarchical Task
Dai, Zhaohui
Chen, Xin
Cao, Weihua
Wu, Min
2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 676 - 681
[27] Approximating Energy Market Clearing and Bidding With Model-Based Reinforcement Learning
Wolgast, Thomas
Niesse, Astrid
IEEE ACCESS, 2024, 12 : 145106 - 145117
[28] Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics
Huang, Xiao
Wang, Xingfang
Zhao, Yan
Hu, Jiachen
Li, Hui
Jiang, Zhihong
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 453 - 465
[29] Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft
Abouheaf, Mohammed
Gueaieb, Wail
Lewis, Frank
ROBOTICS, 2018, 7 (04):
[30] Gradient-Based Neuromorphic Learning on Dynamical RRAM Arrays
Zhou, Peng
Choi, Dong-Uk
Lu, Wei D.
Kang, Sung-Mo
Eshraghian, Jason K.
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (04) : 888 - 897

← 1 2 3 4 5 →