共 69 条
[21]
Dayan P., Hinton G.E., Feudal reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 271-278, (1993)
[22]
Barto A.G., Mahadevan S., Recent advances in hierarchical reinforcement learning, Discrete Event Dyn. Syst., 13, 1-2, pp. 41-77, (2003)
[23]
Schmidhuber J., Learning to generate subgoals for action sequences, Proc. Seattle Int. Joint Conf. Neural Netw., 2, pp. 967-972, (1991)
[24]
Sutton R.S., Precup D., Singh S., Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning, Artif. Intell., 112, 1-2, pp. 181-211, (1999)
[25]
Botvinick M.M., Hierarchical reinforcement learning and decision making, Curr. Opin. Neurobiol., 22, 6, pp. 956-962, (2012)
[26]
Comanici G., Precup D., Optimal policy switching algorithms for reinforcement learning, Proc. 9th Int. Conf. Auton. Agents Multiagent Syst., pp. 709-714, (2010)
[27]
Schaul T., Horgan D., Gregor K., Silver D., Universal value function approximators, Proc. 32nd Int. Conf. Mach. Learn., pp. 1312-1320, (2015)
[28]
Bacon P.-L., Harb J., Precup D., The option-critic architecture, Proc. 31st AAAI Conf. Artif. Intell., pp. 1726-1734, (2017)
[29]
Vezhnevets A.S., Et al., Feudal networks for hierarchical reinforcement learning, Proc. 34th Int. Conf. Mach. Learn., pp. 3540-3549, (2017)
[30]
Nachum O., Gu S., Lee H., Levine S., Data-efficient hierarchical reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 3303-3313, (2018)