HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

被引：7

作者：

Hua, Yun ^{[1
]}

Wang, Xiangfeng ^{[1
,2
]}

Jin, Bo ^{[1
,2
]}

Li, Wenhao ^{[1
]}

Yan, Junchi ^{[3
]}

He, Xiaofeng ^{[1
,2
]}

Zha, Hongyuan ^{[4
,5
]}

机构：

[1] East China Normal Univ, Shanghai, Peoples R China

[2] SRIAS, Shanghai, Peoples R China

[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China

[5] Chinese Univ Hong Kong, AIRS, Shenzhen, Peoples R China

来源：

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年

关键词：

Meta Learning; Reinforcement Learning; Sparse Reward;

D O I：

10.1145/3447548.3467242

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.

引用

页码：637 / 645

页数：9

共 37 条

[1]

Arcari Elena, 2020, ARXIV PREPRINT ARXIV

[2]

Chevalier-Boisvert Maxime, 2018, gym-miniworld environment for openai gym

[3]

Clavera I., 2018, ARXIV180905214, P617

[4]

Devlin S.M., 2012, 11 INT C AUTONOMOUS, P433

[5]

Duan Y., 2016, Rl2: Fast reinforcement learning via slow reinforcement learning

[6]

Duan Y, 2016, PR MACH LEARN RES, V48

[7]

Finn C, 2017, PR MACH LEARN RES, V70

[8]

Florensa C, 2017, PR MACH LEARN RES, V78

[9]

Fu J, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P4019, DOI 10.1109/IROS.2016.7759592

[10]

Ghavamzadeh M, 2015, FOUND TRENDS MACH LE, V8, P360, DOI 10.1561/2200000049

← 1 2 3 4 →