Mixed reinforcement learning for partially observable Markov decision process

被引:0
作者
Dung, Le Tien [1 ]
Komeda, Takashi [2 ]
Takagi, Motoki [1 ]
机构
[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan
[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan
来源
2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION | 2007年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
引用
收藏
页码:436 / +
页数:2
相关论文
共 24 条
[1]  
[Anonymous], 1993, THESIS
[2]  
BAKKER B, 2003, P 2003 IEEE RSJ INT
[3]  
BAKKER B, 2002, 14 NIPS
[4]  
Ballini R, 2001, 10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, P1408, DOI 10.1109/FUZZ.2001.1008922
[5]   Learning to forget: Continual prediction with LSTM [J].
Gers, FA ;
Schmidhuber, J ;
Cummins, F .
NEURAL COMPUTATION, 2000, 12 (10) :2451-2471
[6]   Learning precise timing with LSTM recurrent networks [J].
Gers, FA ;
Schraudolph, NN ;
Schmidhuber, J .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :115-143
[7]  
GOMEZ F, 2006, P EUR C MACH LEARN E
[8]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[9]  
HASINOFF BW, 2002, REINFORCEMENT LEARNI
[10]  
HO F, 1994, P ICNN, V1, P437