Proximal Policy Optimization with Entropy Regularization

被引：0

作者：

Shen, Yuqing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年

关键词：

reinforcement learning; policy gradient; entropy regularization;

D O I：

10.1109/ICCCR61138.2024.10585473

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.

引用

页码：380 / 383

页数：4

共 50 条

[41] Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization
Guan, Yang
Ren, Yangang
Li, Shengbo Eben
Sun, Qi
Luo, Laiquan
Li, Keqiang
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) : 12597 - 12608
[42] Model-Based Reinforcement Learning via Proximal Policy Optimization
Sun, Yuewen
Yuan, Xin
Liu, Wenzhang
Sun, Changyin
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
[43] PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization
Meng, Yuan
Kuppannagari, Sanmukh
Kannan, Rajgopal
Prasanna, Viktor
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2066 - 2078
[44] Policy regularization for legible behavior
Persiani, Michele
Hellstrom, Thomas
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23) : 16781 - 16790
[45] Policy regularization for legible behavior
Michele Persiani
Thomas Hellström
Neural Computing and Applications, 2023, 35 : 16781 - 16790
[46] Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
Lingwei Zhu
Takamitsu Matsubara
Machine Learning, 2023, 112 : 4527 - 4562
[47] Proximal Policy Optimization for Energy Management of Electric Vehicles and PV Storage Units
Alonso, Monica
Amaris, Hortensia
Martin, David
de la Escalera, Arturo
ENERGIES, 2023, 16 (15)
[48] Tuning Proximal Policy Optimization Algorithm in Maze Solving with ML-Agents
Hung, Phan Thanh
Truong, Mac Duy Dan
Hung, Phan Duy
ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 248 - 262
[49] An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
Xue, Wentao
Wu, Hangxing
Ye, Hui
Shao, Shuyi
ACTUATORS, 2022, 11 (04)
[50] Federated proximal policy optimization with action masking: Application in collective heating systems
Ghane, Sara
Jacobs, Stef
Elmaz, Furkan
Huybrechts, Thomas
Verhaert, Ivan
Mercelis, Siegfried
Energy and AI, 2025, 20

← 1 2 3 4 5 →