On the linear convergence of policy gradient under Hadamard parameterization

被引:0
作者
Liu, Jiacai [1 ]
Chen, Jinchi [2 ]
Wei, Ke [1 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai 200433, Peoples R China
[2] East China Univ Sci & Technol, Sch Math, Shanghai 200433, Peoples R China
关键词
policy gradient; Hadamard parameterization; linear convergence; sub-optimal probability;
D O I
10.1093/imaiai/iaaf003
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The convergence of deterministic policy gradient under the Hadamard parameterization is studied in the tabular setting and the linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_{0}$ iterations, where $k_{0}$ is a constant that only depends on the MDP problem and the initialization. To show the local linear convergence of the algorithm, we have indeed established the contraction of the sub-optimal probability $b_{s}<^>{k}$ (i.e. the probability of the output policy $\pi <^>{k}$ on non-optimal actions) when $k\ge k_{0}$.
引用
收藏
页数:38
相关论文
共 50 条
  • [31] Level-Set Subdifferential Error Bounds and Linear Convergence of Bregman Proximal Gradient Method
    Daoli Zhu
    Sien Deng
    Minghua Li
    Lei Zhao
    Journal of Optimization Theory and Applications, 2021, 189 : 889 - 918
  • [32] Linear convergence of the relaxed gradient projection algorithm for solving the split equality problems in Hilbert spaces
    Tian, Tingting
    Shi, Luoyi
    Chen, Rudong
    JOURNAL OF INEQUALITIES AND APPLICATIONS, 2019, 2019 (1)
  • [33] Linear convergence of the relaxed gradient projection algorithm for solving the split equality problems in Hilbert spaces
    Tingting Tian
    Luoyi Shi
    Rudong Chen
    Journal of Inequalities and Applications, 2019
  • [34] New analysis of linear convergence of gradient-type methods via unifying error bound conditions
    Zhang, Hui
    MATHEMATICAL PROGRAMMING, 2020, 180 (1-2) : 371 - 416
  • [35] New analysis of linear convergence of gradient-type methods via unifying error bound conditions
    Hui Zhang
    Mathematical Programming, 2020, 180 : 371 - 416
  • [36] Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems
    Yaohua Hu
    Chong Li
    Kaiwen Meng
    Xiaoqi Yang
    Journal of Global Optimization, 2021, 79 : 853 - 883
  • [37] Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems
    Hu, Yaohua
    Li, Chong
    Meng, Kaiwen
    Yang, Xiaoqi
    JOURNAL OF GLOBAL OPTIMIZATION, 2021, 79 (04) : 853 - 883
  • [38] Linear Convergence of Projection Algorithms
    Dao, Minh N.
    Phan, Hung M.
    MATHEMATICS OF OPERATIONS RESEARCH, 2019, 44 (02) : 715 - 738
  • [39] Linear convergence of cyclic SAGA
    Youngsuk Park
    Ernest K. Ryu
    Optimization Letters, 2020, 14 : 1583 - 1598
  • [40] Linear convergence of cyclic SAGA
    Park, Youngsuk
    Ryu, Ernest K.
    OPTIMIZATION LETTERS, 2020, 14 (06) : 1583 - 1598