共 42 条
[1]
Auer P(2002)Finite-time analysis of the multiarmed bandit problem Mach Learn 47 235-256
[2]
Cesa-Bianchi N(2003)Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems 13 341-379
[3]
Fischer P(2000)Learning to play chess using temporal differences Mach Learn 40 243-263
[4]
Barto AG(2014)Nonstrict hierarchical reinforcement learning for interactive systems and robots TiiS 4 15:1-15, 30
[5]
Mahadevan S(2000)Hierarchical reinforcement learning with the MAXQ value function decomposition J Artif Intell Res (JAIR) 13 227-303
[6]
Baxter J(2004)Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems Appl Intell 20 71-87
[7]
Tridgell A(2009)Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning Appl Intell 31 89-106
[8]
Weaver L(2010)Planning with noisy probabilistic relational rules J Artif Intell Res (JAIR) 39 1-49
[9]
Cuayȧhuitl H(1959)Some studies in machine learning using the game of checkers IBM J Res Dev 3 210-229
[10]
Kruijff-Korbayovȧ I(1999)Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artif Intell 112 181-211