Efficient Deep Reinforcement Learning via Policy-Extended Successor Feature Approximator

被引：0

作者：

Li, Yining ^{[1
]}

Yang, Tianpei ^{[1
,2
]}

Hao, Jianye ^{[1
]}

Zheng, Yan ^{[1
]}

Tang, Hongyao ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China

[2] Univ Alberta, Edmonton, AB, Canada

来源：

DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2022 | 2023年 / 13824卷

关键词：

Reinforcement learning; Transfer learning; Successor features; Policy representation;

D O I：

10.1007/978-3-031-25549-6_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Successor Features (SFs) improve the generalization of Reinforcement Learning across unseen tasks by decoupling the dynamics of the environment from the rewards. However, the decomposition highly depends on the policy learned on the task, which may not be optimal in other tasks. To improve the generalization of SFs, in this paper, we propose a novel SFs learning paradigm, Policy-extended Successor Feature Approximator (PeSFA) which decouples the SFs from the policy by learning a policy representation module and inputting the policy representation to SFs. In this way, when we fit SFs well in the policy representation space, we can directly obtain a better SFs corresponding to any task by searching the policy representation space. Experimental results show that PeSFA significantly improves the generalizability of SFs and accelerates the learning process in two representative environments.

引用

页码：29 / 44

页数：16

共 22 条

[1] Alegre LN, 2022, PR MACH LEARN RES, P394
[2] Alver S., 2022, 10 INT C LEARNING RE
[3] Barreto A, 2017, ADV NEUR IN, V30
[4] Borsa D, 2018, Arxiv, DOI arXiv:1812.07626
[5] Ellenberger B., 2018, Pybullet gymperium
[6] Feinberg A, 1996, SIAM Rev., V38, P689, DOI DOI 10.1137/1038137
[7] Filos A., 2021, INT C MACHINE LEARNI, P3305
[8] Gimelfarb M., 2021, ADV NEURAL INFORM PR, P17298
[9] Han D., 2022, P 31 INT JOINT C ART, P3036
[10] Hansen S., 2020, 8 INT C LEARNING REP

← 1 2 3 →