Stochastic Variance-Reduced Policy Gradient

被引:0
|
作者
Papini, Matteo [1 ]
Binaghi, Damiano [1 ]
Canonaco, Giuseppe [1 ]
Pirotta, Matteo [2 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] INRIA, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Estimate Sequences for Variance-Reduced Stochastic Composite Optimization
    Kulunchakov, Andrei
    Mairal, Julien
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [32] Variance-Reduced Stochastic Learning Under Random Reshuffling
    Ying, Bicheng
    Yuan, Kun
    Sayed, Ali H.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 1390 - 1408
  • [33] Stochastic Variance-Reduced Hamilton Monte Carlo Methods
    Zou, Difan
    Xu, Pan
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [34] Stochastic Variance-Reduced Majorization-Minimization Algorithms
    Phan, Duy Nhat
    Bartz, Sedi
    Guha, Nilabja
    Phan, Hung M.
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2024, 6 (04): : 926 - 952
  • [35] A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization
    Dai, Yutong
    Wang, Guanyi
    Curtis, Frank E.
    Robinson, Daniel P.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [36] A unified variance-reduced accelerated gradient method for convex optimization
    Lan, Guanghui
    Li, Zhize
    Zhou, Yi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [37] An accelerated stochastic variance-reduced method for machine learning problems
    Yang, Zhuang
    Chen, Zengping
    Wang, Cheng
    KNOWLEDGE-BASED SYSTEMS, 2020, 198
  • [38] Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization
    Liao, Shichen
    Liu, Yan
    Han, Congying
    Guo, Tiande
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [39] Variance-Reduced Splitting Schemes for Monotone Stochastic Generalized Equations
    Cui, Shisheng
    Shanbhag, Uday V.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (11) : 6636 - 6648
  • [40] Variance-reduced HMM for Stochastic Slow-Fast Systems
    Melis, Ward
    Samaey, Giovanni
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 1255 - 1266