Temporally extended features in model-based reinforcement learning with partial observability

被引：9

作者：

Lieck, Robert ^{[1
]}

Toussaint, Marc ^{[1
]}

机构：

[1] Univ Stuttgart, Machine Learning & Robot Lab, Univ Str 38, D-70569 Stuttgart, Germany

来源：

NEUROCOMPUTING | 2016年 / 192卷

关键词：

Reinforcement learning; Model learning; Feature learning; Partial observability; Partially observable Markov decision process; Non-Markov decision process; REPRESENTATIONS; OPTIMIZATION;

D O I：

10.1016/j.neucom.2015.12.107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Partial observability poses a major challenge for a reinforcement learning agent since the complete history of observations may be relevant for predicting and acting optimally. This is especially true in the general case where the underlying state space and dynamics are unknown. Existing approaches either try to learn a latent state representation or use decision trees based on the history of observations. In this paper we present a method for explicitly identifying relevant features of the observation history. These temporally extended features can be discovered using our Pulse algorithm and used to learn a compact model of the environment. Temporally extended features reveal the temporal structure of the environment while empirically outperforming other history-based approaches. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：49 / 60

页数：12

共 30 条

[1]

Altun Y., 2003, P INT C MACHINE LEAR, P3

[2]

[Anonymous], 1996, THESIS U ROCHESTER

[3]

[Anonymous], 2003, J. Mach. Learn. Res.

[4]

[Anonymous], P 26 AAAI C ART INT

[5]

Bakker Bram, 2001, INT C NEURAL INF PRO, P1475, DOI DOI 10.5555/2980539.2980731

[6] A Survey of Monte Carlo Tree Search Methods [J].

Browne, Cameron B. ;

Powley, Edward ;

Whitehouse, Daniel ;

Lucas, Simon M. ;

Cowling, Peter I. ;

Rohlfshagen, Philipp ;

Tavener, Stephen ;

Perez, Diego ;

Samothrakis, Spyridon ;

Colton, Simon .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43

[7] REPRESENTATIONS OF QUASI-NEWTON MATRICES AND THEIR USE IN LIMITED MEMORY METHODS [J].

BYRD, RH ;

NOCEDAL, J ;

SCHNABEL, RB .

MATHEMATICAL PROGRAMMING, 1994, 63 (02) :129-156

[8]

Daswani M, 2013, AS C MACH LEARN, P213

[9] GLOBAL OPTIMIZATION AND SIMULATED ANNEALING [J].

DEKKERS, A ;

AARTS, E .

MATHEMATICAL PROGRAMMING, 1991, 50 (03) :367-393

[10] Inducing features of random fields [J].

DellaPietra, S ;

DellaPietra, V ;

Lafferty, J .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393

← 1 2 3 →