On the effects of domain size and complexity in empirical distribution of reinforcement learning

被引：0

作者：

Iwata, K ^{[1
]}

Ikeda, K ^{[1
]}

Sakai, H ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Kyoto 6068501, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2005年 / E88D卷 / 01期

关键词：

reinforcement learning; Markov decision process; Lempel-Ziv coding; domain size; stochastic complexity;

D O I：

10.1093/ietisy/E88-D.1.135

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We regard the events of a Markov decision process as the outputs from a Markov information source in order to analyze the randomness of an empirical sequence by the codeword length of the sequence. The randomness is an important viewpoint in reinforcement learning since the learning is to eliminate the randomness and to find an optimal policy. The occurrence of optimal empirical sequence also depends on the randomness. We then introduce the Lempel-Ziv coding for measuring the randomness which consists of the domain size and the stochastic complexity. In experimental results, we confirm that the learning and the occurrence of optimal empirical sequence depend on the randomness and show the fact that in early stages the randomness is mainly characterized by the domain size and as the number of time steps increases the randomness depends greatly on the complexity of Markov decision processes.

引用

页码：135 / 142

页数：8

共 16 条

[1] CONNELL JH, 1993, KLUWER INT SERIES EN, V233
[2] HAN TS, 2002, TRANSLATIONS MATH MO, V203
[3] HAN TS, 2003, APPL MATH, V50, P1
[4] A new criterion using information gain for action selection strategy in reinforcement learning
Iwata, K
Ikeda, K
Sakai, H
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (04): : 792 - 799
[5] IWATA K, 2004, P BRAIN INSP COGN SY
[6] Reinforcement learning: A survey
Kaelbling, LP
Littman, ML
Moore, AW
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 237 - 285
[7] Kushner H. J., 1997, APPL MATH, V35
[8] COMPLEXITY OF FINITE SEQUENCES
LEMPEL, A
ZIV, J
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) : 75 - 81
[9] A reinforcement learning approach to online clustering
Likas, A
[J]. NEURAL COMPUTATION, 1999, 11 (08) : 1915 - 1932
[10] k-certainty exploration method: An action selector to identify the environment in reinforcement learning (Reprinted from J Japan Soc Artif Intell, vol 10)
Miyazaki, K
Yamamura, M
Kobayashi, S
[J]. ARTIFICIAL INTELLIGENCE, 1997, 91 (01) : 155 - 171

← 1 2 →