HQ-learning

被引:88
作者
Wiering, M [1 ]
Schmidhuber, J [1 ]
机构
[1] IDSIA, CH-6900 Lugano, Switzerland
关键词
reinforcement learning; hierarchical Q-learning; POMDPs; non-Markov; subgoal learning;
D O I
10.1177/105971239700600202
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
HQ-learning is a hierarchical extension of Q(lambda)-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in moss previous POMDP work.
引用
收藏
页码:219 / 246
页数:28
相关论文
共 53 条
[1]  
[Anonymous], ANIMALS ANIMATS
[2]  
[Anonymous], 1971, THESIS I OPERATIONS
[3]  
[Anonymous], ADV GENETIC PROGRAMM
[4]  
[Anonymous], THESIS KINGS COLL OX
[5]  
BOUTILIER C, 1996, AAAI 1996 P 13 NAT C
[6]  
CAIRONI PVC, 1994, IRIDIA9414 U LIBR BR
[7]  
CHRISMAN L, P 10 INT C ART INT, P92
[8]   ADDING TEMPORARY MEMORY TO ZCS [J].
CLIFF, D ;
ROSS, S .
ADAPTIVE BEHAVIOR, 1994, 3 (02) :101-150
[9]  
COHN D, 1994, ADV NEURAL INFORMATI, V6
[10]  
DAYAN P, 1993, ADV NEURAL INFORMATI, V5