Proximal Policy Optimization with Entropy Regularization

被引:0
作者
Shen, Yuqing [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年
关键词
reinforcement learning; policy gradient; entropy regularization;
D O I
10.1109/ICCCR61138.2024.10585473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [41] Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization
    Guan, Yang
    Ren, Yangang
    Li, Shengbo Eben
    Sun, Qi
    Luo, Laiquan
    Li, Keqiang
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) : 12597 - 12608
  • [42] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [43] PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization
    Meng, Yuan
    Kuppannagari, Sanmukh
    Kannan, Rajgopal
    Prasanna, Viktor
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2066 - 2078
  • [44] Policy regularization for legible behavior
    Persiani, Michele
    Hellstrom, Thomas
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23) : 16781 - 16790
  • [45] Policy regularization for legible behavior
    Michele Persiani
    Thomas Hellström
    Neural Computing and Applications, 2023, 35 : 16781 - 16790
  • [46] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
    Lingwei Zhu
    Takamitsu Matsubara
    Machine Learning, 2023, 112 : 4527 - 4562
  • [47] Proximal Policy Optimization for Energy Management of Electric Vehicles and PV Storage Units
    Alonso, Monica
    Amaris, Hortensia
    Martin, David
    de la Escalera, Arturo
    ENERGIES, 2023, 16 (15)
  • [48] Tuning Proximal Policy Optimization Algorithm in Maze Solving with ML-Agents
    Hung, Phan Thanh
    Truong, Mac Duy Dan
    Hung, Phan Duy
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 248 - 262
  • [49] An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
    Xue, Wentao
    Wu, Hangxing
    Ye, Hui
    Shao, Shuyi
    ACTUATORS, 2022, 11 (04)
  • [50] Federated proximal policy optimization with action masking: Application in collective heating systems
    Ghane, Sara
    Jacobs, Stef
    Elmaz, Furkan
    Huybrechts, Thomas
    Verhaert, Ivan
    Mercelis, Siegfried
    Energy and AI, 2025, 20