Regularized feature selection in reinforcement learning

被引：11

作者：

Wookey, Dean S. ^{[1
]}

Konidaris, George D. ^{[2
,3
]}

机构：

[1] Univ Witwatersrand, Sch Comp Sci & Appl Math, Johannesburg, South Africa

[2] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA

[3] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

来源：

MACHINE LEARNING | 2015年 / 100卷 / 2-3期

关键词：

Feature selection; Reinforcement learning; Function approximation; Regularization; Linear function approximation; OMP-TD;

D O I：

10.1007/s10994-015-5518-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce feature regularization during feature selection for value function approximation. Feature regularization introduces a prior into the selection process, improving function approximation accuracy and reducing overfitting. We show that the smoothness prior is effective in the incremental feature selection setting and present closed-form smoothness regularizers for the Fourier and RBF bases. We present two methods for feature regularization which extend the temporal difference orthogonal matching pursuit (OMP-TD) algorithm and demonstrate the effectiveness of the smoothness prior; smooth Tikhonov OMP-TD and smoothness scaled OMP-TD. We compare these methods against OMP-TD, regularized OMP-TD and least squares TD with random projections, across six benchmark domains using two different types of basis functions.

引用

页码：655 / 676

页数：22

共 31 条

[1]

[Anonymous], 1990, SPLINE MODELS OBSERV

[2]

[Anonymous], 2012, P 29 INT C MACH LEAR

[3]

[Anonymous], RC CAR DOMAIN

[4]

[Anonymous], P 28 INT C MACH LEAR

[5]

[Anonymous], 2010, Advances in Neural Information Processing Systems (NIPS)

[6]

[Anonymous], 1998, THESIS MIT

[7]

[Anonymous], 2020, Reinforcement learning: An introduction

[8] On the kernel widths in radial-basis function networks [J].

Benoudjit, N ;

Verleysen, M .

NEURAL PROCESSING LETTERS, 2003, 18 (02) :139-154

[9]

Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723

[10]

Dabney William, 2012, AAAI C ARTIFICIAL IN, V26, P872

← 1 2 3 4 →