Stochastic Variance-Reduced Policy Gradient

被引：0

作者：

Papini, Matteo ^{[1
]}

Binaghi, Damiano ^{[1
]}

Canonaco, Giuseppe ^{[1
]}

Pirotta, Matteo ^{[2
]}

Restelli, Marcello ^{[1
]}

机构：

[1] Politecn Milan, Milan, Italy

[2] INRIA, Lille, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

引用

页数：10

共 50 条

[1] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Xu, Pan
Gao, Felicia
Gu, Quanquan
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
[2] Accelerating variance-reduced stochastic gradient methods
Derek Driggs
Matthias J. Ehrhardt
Carola-Bibiane Schönlieb
Mathematical Programming, 2022, 191 : 671 - 715
[3] Accelerating variance-reduced stochastic gradient methods
Driggs, Derek
Ehrhardt, Matthias J.
Schonlieb, Carola-Bibiane
MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
[4] Variance-Reduced Stochastic Gradient Descent on Streaming Data
Jothimurugesan, Ellango
Tahmasbi, Ashraf
Gibbons, Phillip B.
Tirthapura, Srikanta
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics
Zou, Difan
Xu, Pan
Gu, Quanquan
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 508 - 518
[6] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
Martin Morin
Pontus Giselsson
Numerical Algorithms, 2022, 91 : 749 - 772
[7] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
Morin, Martin
Giselsson, Pontus
NUMERICAL ALGORITHMS, 2022, 91 (02) : 749 - 772
[8] Communication-efficient Variance-reduced Stochastic Gradient Descent
Ghadikolaei, Hossein S.
Magnusson, Sindri
IFAC PAPERSONLINE, 2020, 53 (02): : 2648 - 2653
[9] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Zhang, Junyu
Ni, Chengzhuo
Yu, Zheng
Szepesvari, Csaba
Wang, Mengdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks
Wu, Zhaoxian
Ling, Qing
Chen, Tianyi
Giannakis, Georgios B.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) : 4583 - 4596

← 1 2 3 4 5 →