Stochastic Variance-Reduced Policy Gradient

被引:0
|
作者
Papini, Matteo [1 ]
Binaghi, Damiano [1 ]
Canonaco, Giuseppe [1 ]
Pirotta, Matteo [2 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] INRIA, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Lock-Free Parallelization for Variance-Reduced Stochastic Gradient Descent on Streaming Data
    Peng, Yaqiong
    Hao, Zhiyu
    Yun, Xiaochun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 2220 - 2231
  • [22] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Xu, Yangyang
    Xu, Yibo
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 266 - 297
  • [23] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Yangyang Xu
    Yibo Xu
    Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
  • [24] PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
    Gargiani, Matilde
    Zanelli, Andrea
    Martinelli, Andrea
    Summers, Tyler H.
    Lygeros, John
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [25] MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization
    Condat, Laurent
    Richtarik, Peter
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 190, 2022, 190
  • [26] Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization
    Wang, Zhe
    Zhou, Yi
    Liang, Yingbin
    Lan, Guanghui
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [27] Stochastic Recursive Variance-Reduced Cubic Regularization Methods
    Zhou, Dongruo
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3980 - 3989
  • [28] Stochastic Variance-Reduced Cubic Regularized Newton Methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [29] Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence
    Xin, Ran
    Khan, Usman A.
    Kar, Soummya
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 6255 - 6271
  • [30] Variance-Reduced and Projection-Free Stochastic Optimization
    Hazan, Elad
    Luo, Haipeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48