共 28 条
[1]
Beyond the one step greedy approach in reinforcement learning. Efroni Y,Dalal G,Scherrer B,et al. https://arxiv.org/abs/1802.03654 . 2018
[2]
Reinforcement Learning: An Introduction. Sutton RS,Barto AG. . 1998
[3]
On the Theory of Dynamic Programming. Bellman,R. Proceedings of the National Academy of Sciences of the United States of America . 1952
[4]
Incremental gradient,subgradient,and proximal methods for convex optimization:a survey. BERTSEKAS D P. Optimization for Machine Learning . 2012
[5]
Online least-squares policy iteration for reinforcement learning control. L. Busoniu,D. Ernst,B. D. Schutter,R. Babuska. Proceedings 2010 American Control Conference . 2010
[6]
Least-squaresλpolicy iteration:bias-variance trade-off in control problems. Thiery C,Scherrer B. Proc 27th Int Conf on Machine Learning . 2010
[7]
Unmanned surface vehicles:an overview of developments and challenges. Liu Z,Zhang Y,Yu X,et al. Annual Reviews in Control . 2016
[8]
Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances[J] . Ban Wang,Xiang Yu,Lingxia Mu,Youmin Zhang.  Mechanical Systems and Signal Processing . 2019
[9]
A Case Study on Air Combat Decision Using Approximated Dynamic Programming[J] . Yaofei Ma,Xiaole Ma,Xiao Song,Minrui Fei.  Mathematical Problems in Engineering . 2014
[10]
Lambda-policy iteration:a review and a new implementation. Bertsekas DP. https://arxiv.org/abs/1507.01029 . 2015