Proximal Policy Optimization with Entropy Regularization

被引：0

作者：

Shen, Yuqing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年

关键词：

reinforcement learning; policy gradient; entropy regularization;

D O I：

10.1109/ICCCR61138.2024.10585473

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.

引用

页码：380 / 383

页数：4

共 50 条

[31] Implementing action mask in proximal policy optimization (PPO) algorithm
Tang, Cheng-Yen
Liu, Chien-Hung
Chen, Woei-Kae
You, Shingchern D.
ICT EXPRESS, 2020, 6 (03): : 200 - 203
[32] Control of conventional continuous thickeners via proximal policy optimization
Silva, Jonathan R.
Euzebio, Thiago A. M.
Braga, Marcio F.
MINERALS ENGINEERING, 2024, 214
[33] Automated cloud resources provisioning with the use of the proximal policy optimization
Włodzimierz Funika
Paweł Koperek
Jacek Kitowski
The Journal of Supercomputing, 2023, 79 : 6674 - 6704
[34] Automated cloud resources provisioning with the use of the proximal policy optimization
Funika, Wlodzimierz
Koperek, Pawel
Kitowski, Jacek
JOURNAL OF SUPERCOMPUTING, 2023, 79 (06) : 6674 - 6704
[35] Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm
Jia R.-D.
Ning W.-B.
He D.-K.
Chu F.
Wang F.-L.
Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3075 - 3082
[36] Combustion optimization study of pulverized coal boiler based on proximal policy optimization algorithm
Wu, Xuecheng
Zhang, Hongnan
Chen, Huafeng
Wang, Shifeng
Gong, Lingling
APPLIED THERMAL ENGINEERING, 2024, 254
[37] Optimal Control Algorithm for Subway Train Operation by Proximal Policy Optimization
Chen, Bin
Gao, Chunhai
Zhang, Lei
Chen, Junjie
Chen, Jun
Li, Yuyi
APPLIED SCIENCES-BASEL, 2023, 13 (13):
[38] Proximal policy optimization with adaptive threshold for symmetric relative density ratio
Kobayashi, Taisuke
RESULTS IN CONTROL AND OPTIMIZATION, 2023, 10
[39] Application of Proximal Policy Optimization for Resource Orchestration in Serverless Edge Computing
Femminella, Mauro
Reali, Gianluca
COMPUTERS, 2024, 13 (09)
[40] Evaluation of Proximal Policy Optimization with Extensions in Virtual Environments of Various Complexity
Rauch, Robert
Korecko, Stefan
Gazda, Juraj
2022 32ND INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2022, : 251 - 255

← 1 2 3 4 5 →