HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

被引:7
作者
Hua, Yun [1 ]
Wang, Xiangfeng [1 ,2 ]
Jin, Bo [1 ,2 ]
Li, Wenhao [1 ]
Yan, Junchi [3 ]
He, Xiaofeng [1 ,2 ]
Zha, Hongyuan [4 ,5 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] SRIAS, Shanghai, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China
[5] Chinese Univ Hong Kong, AIRS, Shenzhen, Peoples R China
来源
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年
关键词
Meta Learning; Reinforcement Learning; Sparse Reward;
D O I
10.1145/3447548.3467242
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
引用
收藏
页码:637 / 645
页数:9
相关论文
共 37 条
[1]  
Arcari Elena, 2020, ARXIV PREPRINT ARXIV
[2]  
Chevalier-Boisvert Maxime, 2018, gym-miniworld environment for openai gym
[3]  
Clavera I., 2018, ARXIV180905214, P617
[4]  
Devlin S.M., 2012, 11 INT C AUTONOMOUS, P433
[5]  
Duan Y., 2016, Rl2: Fast reinforcement learning via slow reinforcement learning
[6]  
Duan Y, 2016, PR MACH LEARN RES, V48
[7]  
Finn C, 2017, PR MACH LEARN RES, V70
[8]  
Florensa C, 2017, PR MACH LEARN RES, V78
[9]  
Fu J, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P4019, DOI 10.1109/IROS.2016.7759592
[10]  
Ghavamzadeh M, 2015, FOUND TRENDS MACH LE, V8, P360, DOI 10.1561/2200000049