Proximal Policy Optimization with Entropy Regularization

被引:0
作者
Shen, Yuqing [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年
关键词
reinforcement learning; policy gradient; entropy regularization;
D O I
10.1109/ICCCR61138.2024.10585473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [1] Entropy adjustment by interpolation for exploration in Proximal Policy Optimization (PPO)
    Boudlal, Ayoub
    Khafaji, Abderahim
    Elabbadi, Jamal
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [2] Trust region policy optimization via entropy regularization for Kullback-Leibler divergence constraint
    Xu, Haotian
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2024, 589
  • [3] Relative Entropy of Correct Proximal Policy Optimization Algorithms with Modified Penalty Factor in Complex Environment
    Chen, Weimin
    Wong, Kelvin Kian Loong
    Long, Sifan
    Sun, Zhili
    ENTROPY, 2022, 24 (04)
  • [4] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [5] Authentic Boundary Proximal Policy Optimization
    Cheng, Yuhu
    Huang, Longyang
    Wang, Xuesong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9428 - 9438
  • [6] Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
    Cen, Shicong
    Wei, Yuting
    Chi, Yuejie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 48
  • [7] PPO-CMA: PROXIMAL POLICY OPTIMIZATION WITH COVARIANCE MATRIX ADAPTATION
    Hamalainen, Perttu
    Babadi, Amin
    Ma, Xiaoxiao
    Lehtinen, Jaakko
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [8] Image captioning via proximal policy optimization
    Zhang, Le
    Zhang, Yanshuo
    Zhao, Xin
    Zou, Zexiao
    IMAGE AND VISION COMPUTING, 2021, 108
  • [9] Proximal Policy Optimization with Relative Pearson Divergence
    Kobayashi, Taisuke
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
  • [10] Improving proximal policy optimization with alpha divergence
    Xu, Haotian
    Yan, Zheng
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2023, 534 : 94 - 105