Stochastic Variance-Reduced Policy Gradient

被引:0
|
作者
Papini, Matteo [1 ]
Binaghi, Damiano [1 ]
Canonaco, Giuseppe [1 ]
Pirotta, Matteo [2 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] INRIA, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
    Xu, Pan
    Gao, Felicia
    Gu, Quanquan
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
  • [2] Accelerating variance-reduced stochastic gradient methods
    Derek Driggs
    Matthias J. Ehrhardt
    Carola-Bibiane Schönlieb
    Mathematical Programming, 2022, 191 : 671 - 715
  • [3] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [4] Variance-Reduced Stochastic Gradient Descent on Streaming Data
    Jothimurugesan, Ellango
    Tahmasbi, Ashraf
    Gibbons, Phillip B.
    Tirthapura, Srikanta
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [5] Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics
    Zou, Difan
    Xu, Pan
    Gu, Quanquan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 508 - 518
  • [6] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Martin Morin
    Pontus Giselsson
    Numerical Algorithms, 2022, 91 : 749 - 772
  • [7] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Morin, Martin
    Giselsson, Pontus
    NUMERICAL ALGORITHMS, 2022, 91 (02) : 749 - 772
  • [8] Communication-efficient Variance-reduced Stochastic Gradient Descent
    Ghadikolaei, Hossein S.
    Magnusson, Sindri
    IFAC PAPERSONLINE, 2020, 53 (02): : 2648 - 2653
  • [9] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
    Zhang, Junyu
    Ni, Chengzhuo
    Yu, Zheng
    Szepesvari, Csaba
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks
    Wu, Zhaoxian
    Ling, Qing
    Chen, Tianyi
    Giannakis, Georgios B.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) : 4583 - 4596