Reinforcement Learning in Few-Shot Scenarios: A Survey

被引:5
作者
Wang, Zhechao [1 ,2 ]
Fu, Qiming [1 ,2 ]
Chen, Jianping [2 ,3 ]
Wang, Yunzhe [1 ,2 ]
Lu, You [1 ,2 ]
Wu, Hongjie [1 ,2 ]
机构
[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou 215009, Peoples R China
[2] Suzhou Univ Sci & Technol, Jiangsu Prov Key Lab Intelligent Bldg Energy Effic, Suzhou 215009, Peoples R China
[3] Suzhou Univ Sci & Technol, Sch Architecture & Urban Planning, Suzhou 215009, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Few-shot learning; Meta-learning; Transfer learning; FRAMEWORK; MODEL;
D O I
10.1007/s10723-023-09663-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning has a demand for massive data in complex problems, which makes it infeasible to be applied to real cases where sampling is difficult. The key to coping with these few-shot problems is knowledge generalization, and related algorithms are often called few-shot reinforcement learning (FS-RL). However, there lacks a formal definition and comprehensive analyses of few-shot scenarios and FS-RL algorithms. Therefore, after giving a uniform definition, we categorize few-shot scenarios into two types. The first type pursues more professional performance, while the other one pursues more general performance. In the process of knowledge transfer, few-shot scenarios usually have an obvious tendency to some type of knowledge. Based on this, we divide FS-RL algorithms into two types: the direct transfer case and the indirect transfer case. Thereafter, existing algorithms are discussed under this classification. Finally, we discuss future directions of FS-RL from the aspect of both theory and application.
引用
收藏
页数:22
相关论文
共 91 条
[1]  
Abbeel P., 2007, ADV NEURAL INFORM PR, P1
[2]  
Abel D, 2016, PR MACH LEARN RES, V48
[3]  
Abel D, 2019, AAAI CONF ARTIF INTE, P9876
[4]  
Abel D, 2018, PR MACH LEARN RES, V80
[5]  
Anderson J.R., 2009, Cognitive psychology and its implications, V7th
[6]  
Andrychowicz M., 2017, P ADV NEURAL INFORM, V30
[7]  
Andrychowicz M, 2016, ADV NEUR IN, V29
[8]  
[Anonymous], 2013, P 2013 AAAI SPRING S
[9]  
Approximate policy iteration with demonstration data, 2013, RLDM, V2013, P168
[10]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726