Mixed reinforcement learning for partially observable Markov decision process

被引：0

作者：

Dung, Le Tien ^{[1
]}

Komeda, Takashi ^{[2
]}

Takagi, Motoki ^{[1
]}

机构：

[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan

[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan

来源：

2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION | 2007年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

引用

页码：436 / +

页数：2

共 24 条

[1]

[Anonymous], 1993, THESIS

[2]

BAKKER B, 2003, P 2003 IEEE RSJ INT

[3]

BAKKER B, 2002, 14 NIPS

[4]

Ballini R, 2001, 10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, P1408, DOI 10.1109/FUZZ.2001.1008922

[5] Learning to forget: Continual prediction with LSTM [J].

Gers, FA ;

Schmidhuber, J ;

Cummins, F .

NEURAL COMPUTATION, 2000, 12 (10) :2451-2471

[6] Learning precise timing with LSTM recurrent networks [J].

Gers, FA ;

Schraudolph, NN ;

Schmidhuber, J .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :115-143

[7]

GOMEZ F, 2006, P EUR C MACH LEARN E

[8]

Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]

[9]

HASINOFF BW, 2002, REINFORCEMENT LEARNI

[10]

HO F, 1994, P ICNN, V1, P437

← 1 2 3 →