CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION

被引:0
|
作者
Cayci, Semih [1 ]
He, Niao [2 ]
Srikant, R. [3 ]
机构
[1] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[2] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[3] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
瑞士国家科学基金会;
关键词
reinforcement learning; policy gradient; nonconvex optimization;
D O I
10.1137/22M1540156
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite- time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of (O) over tilde (1/T) up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
引用
收藏
页码:2729 / 2755
页数:27
相关论文
共 20 条
  • [1] Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
    Ged, Francois G.
    Veiga, Maria Han
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [2] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [3] On linear and super-linear convergence of Natural Policy Gradient algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    SYSTEMS & CONTROL LETTERS, 2022, 164
  • [4] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [5] On the linear convergence of policy gradient under Hadamard parameterization
    Liu, Jiacai
    Chen, Jinchi
    Wei, Ke
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2025, 14 (01)
  • [6] Generalized Compatible Function Approximation for Policy Gradient Search
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    Pang, Shaoning
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 615 - 622
  • [7] Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon
    Zhang, Xinpei
    Jia, Guangyan
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2025, 547 (01)
  • [8] On the convergence of temporal-difference learning with linear function approximation
    Tadic, V
    MACHINE LEARNING, 2001, 42 (03) : 241 - 267
  • [9] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
    Vladislav Tadić
    Machine Learning, 2001, 42 : 241 - 267
  • [10] A policy gradient reinforcement learning algorithm with fuzzy function approximation
    Gu, DB
    Yang, EF
    IEEE ROBIO 2004: Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2004, : 936 - 940