Efficient exploration through active learning for value function approximation in reinforcement learning

被引：15

作者：

Akiyama, Takayuki ^{[1
]}

Hachiya, Hirotaka ^{[1
]}

Sugiyama, Masashi ^{[1
,2
]}

机构：

[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan

[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan

来源：

NEURAL NETWORKS | 2010年 / 23卷 / 05期

关键词：

Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;

D O I：

10.1016/j.neunet.2009.12.010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：639 / 648

页数：10

共 22 条

[1] [Anonymous], 2003, J. Mach. Learn. Res.
[2] [Anonymous], 2006, Pattern recognition and machine learning
[3] R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
Brafman, RI
Tennenholtz, M
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) : 213 - 231
[4] Active learning with statistical models
Cohn, DA
Ghahramani, Z
Jordan, MI
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 129 - 145
[5] Fedorov V., 1972, Theory of Optimal Experiment
[6] Adaptive importance sampling for value function approximation in off-policy reinforcement learning
Hachiya, Hirotaka
Akiyama, Takayuki
Sugiayma, Masashi
Peters, Jan
[J]. NEURAL NETWORKS, 2009, 22 (10) : 1399 - 1410
[7] HENKEL RE, 1979, TESTS SIGNIFICANCE
[8] RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS
HOERL, AE
KENNARD, RW
[J]. TECHNOMETRICS, 1970, 12 (01) : 55 - &
[9] Reinforcement learning: A survey
Kaelbling, LP
Littman, ML
Moore, AW
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 237 - 285
[10] Active learning algorithm using the maximum weighted log-likelihood estimator
Kanamori, T
Shimodaira, H
[J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 116 (01) : 149 - 162

← 1 2 3 →