Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

被引:0
作者
Shuhang Chen
Adithya Devraj
Andrey Berstein
Sean Meyn
机构
[1] University of Florida,Department of Mathematics
[2] Stanford University,undefined
[3] NREL,undefined
[4] University of Florida,undefined
来源
Journal of Systems Science and Complexity | 2021年 / 34卷
关键词
Learning and adaptive systems in artificial intelligence; reinforcement learning; stochastic approximation;
D O I
暂无
中图分类号
学科分类号
摘要
Several decades ago, Profs. Sean Meyn and Lei Guo were postdoctoral fellows at ANU, where they shared interest in recursive algorithms. It seems fitting to celebrate Lei Guo’s 60th birthday with a review of the ODE Method and its recent evolution, with focus on the following themes: The method has been regarded as a technique for algorithm analysis. It is argued that this viewpoint is backwards: The original stochastic approximation method was surely motivated by an ODE, and tools for analysis came much later (based on establishing robustness of Euler approximations). The paper presents a brief survey of recent research in machine learning that shows the power of algorithm design in continuous time, following by careful approximation to obtain a practical recursive algorithm.While these methods are usually presented in a stochastic setting, this is not a prerequisite. In fact, recent theory shows that rates of convergence can be dramatically accelerated by applying techniques inspired by quasi Monte-Carlo.Subject to conditions, the optimal rate of convergence can be obtained by applying the averaging technique of Polyak and Ruppert. The conditions are not universal, but theory suggests alternatives to achieve acceleration.The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.
引用
收藏
页码:1681 / 1702
页数:21
相关论文
共 61 条
[11]  
Guo L(1977)-learning Trans. on Automatic Control 22 551-575
[12]  
Ljung L(2020)Analysis of recursive stochastic algorithms IFAC — PapersOnLine 53 1373-1378
[13]  
Robbins H(2000)Adaptive and robust control in the USSR Communications in Contemporary Mathematics 2 1-34
[14]  
Monro S(2017)The heavy ball with friction method, I. the continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system Proc. Advances in Neural Information Processing Systems 30 6796-6806
[15]  
Jaakola T(2016)Acceleration and averaging in stochastic descent dynamics Proc. of the National Academy of Sciences 113 E7351-E7358
[16]  
Jordan M(1990)A variational perspective on accelerated methods in optimization Statistics 21 251-272
[17]  
Singh S(2012)Sequences with low discrepancy generalisation and application to Robbins-Monro algorithm Monte Carlo Methods and Applications 18 1-51
[18]  
Tsitsiklis J(2000)Stochastic approximation with averaging innovation applied to finance SIAM J. Control Optim. 38 447-469
[19]  
Ljung L(2019)The ODE method for convergence of stochastic approximation and reinforcement learning IEEE Transactions on Automatic Control 64 2614-2620
[20]  
Fradkov A(1992)Stability of stochastic approximations with ‘controlled Markov’ noise and temporal difference learning SIAM J. Control Optim. 30 838-855