SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

被引：0

作者：

Chester, Andrew ^{[1
]}

Dann, Michael ^{[1
]}

Zambetta, Fabio ^{[1
]}

Thangarajah, John ^{[1
]}

机构：

[1] RMIT Univ, Sch Comp Technol, Melbourne, Vic, Australia

来源：

ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II | 2024年 / 14472卷

关键词：

Reinforcement Learning; Deep Learning; SHOGI; CHESS; GO;

D O I：

10.1007/978-981-99-8391-9_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based reinforcement learning algorithms are typically more sample efficient than their model-free counterparts, especially in sparse reward problems. Unfortunately, many interesting domains are too complex to specify complete models, and learning a model takes a large number of environment samples. If we could specify an incomplete model and allow the agent to learn how best to use it, we could take advantage of our partial understanding of many domains. In this work we propose SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.

引用

页码：274 / 285

页数：12

共 31 条

[1]

Andreas J, 2017, PR MACH LEARN RES, V70

[2]

Bagaria A., 2020, INT C LEARNING REPRE

[3] Hierarchical reinforcement learning with the MAXQ value function decomposition [J].

Dietterich, TG .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303

[4]

Fan Z, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2279

[5]

François-Lavet V, 2019, AAAI CONF ARTIF INTE, P3582

[6]

Gopalan N, 2017, P I C AUTOMAT PLAN S, P480

[7]

Gordon D, 2019, Arxiv, DOI arXiv:1901.01492

[8]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[9]

Hafner D, 2019, PR MACH LEARN RES, V97

[10] The Fast Downward planning system [J].

Helmert, Malte .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 26 :191-246

← 1 2 3 4 →