共 14 条
- [1] Baum LE(1972)An equality and associated maximization technique in statistical estimation for probabilistic functions of markov processes Inequalities 3 1-8
- [2] Dietterich TG(2000)Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research (JAIR) 13 227-303
- [3] Lagoudakis M(2003)Least-squares policy iteration Journal of Machine Learning Research 4 1107-1149
- [4] Parr R(2001)Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning Robotics and Autonomous Systems 36 37-51
- [5] Morimoto J(1999)Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence 112 181-211
- [6] Doya K(2010)A generalized path integral control approach to reinforcement learning Journal of Machine Learning Research 11 3137-3181
- [7] Sutton RS(1992)Q-learning Machine Learning 8 279-292
- [8] Precup D(undefined)undefined undefined undefined undefined-undefined
- [9] Singh S(undefined)undefined undefined undefined undefined-undefined
- [10] Theodorou E(undefined)undefined undefined undefined undefined-undefined