On the Global Optimum Convergence of Momentum-based Policy Gradient

被引:0
|
作者
Ding, Yuhao [1 ]
Zhang, Junzi [2 ]
Lavaei, Javad [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Amazon Advertising, San Francisco, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fishernon-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by (O) over tilde(epsilon(-1.5)) and (O) over tilde(epsilon(-1)), respectively, where epsilon > 0 is the target tolerance. Our results for the generic Fishernon-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an (O) over tilde (epsilon(-3)) global optimality sample complexity. Finally, as a byproduct, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Momentum-Based Policy Gradient Methods
    Huang, Feihu
    Gao, Shangqian
    Pei, Jian
    Huang, Heng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [2] Convergence of Momentum-Based Stochastic Gradient Descent
    Jin, Ruinan
    He, Xingkang
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
  • [3] MDPGT: Momentum-Based Decentralized Policy Gradient Tracking
    Jiang, Zhanhong
    Lee, Xian Yeow
    Tan, Sin Yong
    Tan, Kai Liang
    Balu, Aditya
    Lee, Young M.
    Hegde, Chinmay
    Sarkar, Soumik
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9377 - 9385
  • [4] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
    Gao, Xuefeng
    Gurbuzbalaban, Mert
    Zhu, Lingjiong
    OPERATIONS RESEARCH, 2021, : 2931 - 2947
  • [5] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
    Gao, Xuefeng
    Gürbüzbalaban, Mert
    Zhu, Lingjiong
    Operations Research, 2022, 70 (05) : 2931 - 2947
  • [6] Unbiased quasi-hyperbolic nesterov-gradient momentum-based optimizers for accelerating convergence
    Cheng, Weiwei
    Yang, Xiaochun
    Wang, Bin
    Wang, Wei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (04): : 1323 - 1344
  • [7] Unbiased quasi-hyperbolic nesterov-gradient momentum-based optimizers for accelerating convergence
    Weiwei Cheng
    Xiaochun Yang
    Bin Wang
    Wei Wang
    World Wide Web, 2023, 26 : 1323 - 1344
  • [8] Convergence of Momentum-based Distributed Stochastic Approximation with RL Applications
    Naskar, Ankur
    Thoppe, Gugan
    2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 178 - 179
  • [9] Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction
    Feng, Jie
    Wei, Ke
    Chen, Jinchi
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 101 (02)
  • [10] Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients
    Reddy, Tadipatri Uday Kiran
    Vidyasagar, M.
    2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 182 - 187