Improving proximal policy optimization with alpha divergence

被引：3

作者：

Xu, Haotian ^{[1
]}

Yan, Zheng ^{[1
]}

Xuan, Junyu ^{[1
]}

Zhang, Guangquan ^{[1
]}

Lu, Jie ^{[1
]}

机构：

[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia

来源：

NEUROCOMPUTING | 2023年 / 534卷

基金：

澳大利亚研究理事会;

关键词：

Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;

D O I：

10.1016/j.neucom.2023.02.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.

引用

页码：94 / 105

页数：12

共 50 条

[41] On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data
Hachaj, Tomasz
Piekarczyk, Marcin
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[42] Reactive Power Optimization Based on Proximal Policy Optimization of Deep Reinforcement Learning
Zahng P.
Zhu Z.
Xie H.
Dianwang Jishu/Power System Technology, 2023, 47 (02): : 562 - 570
[43] A proximal policy optimization with curiosity algorithm for virtual drone navigation
Das, Rupayan
Khan, Angshuman
Paul, Gunjan
ENGINEERING RESEARCH EXPRESS, 2024, 6 (01):
[44] A proximal policy optimization based intelligent home solar management
Creer, Kode
Parvez, Imtiaz
2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, : 463 - 467
[45] Proximal policy optimization-based controller for chaotic systems
Yau, Her-Terng
Kuo, Ping-Huan
Luan, Po-Chien
Tseng, Yung-Ruen
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (01) : 586 - 601
[46] A Proximal Policy Optimization method in UAV swarm formation control
Yu, Ning
Juan, Feng
Zhao, Hongwei
ALEXANDRIA ENGINEERING JOURNAL, 2024, 100 : 268 - 276
[47] Decision Planning for Autonomous Driving Based on Proximal Policy Optimization
Li, Shuang
Liu, Chunsheng
Nie, Zhaoying
PROCEEDINGS OF THE 2024 3RD INTERNATIONAL SYMPOSIUM ON INTELLIGENT UNMANNED SYSTEMS AND ARTIFICIAL INTELLIGENCE, SIUSAI 2024, 2024, : 145 - 148
[48] Implementing action mask in proximal policy optimization (PPO) algorithm
Tang, Cheng-Yen
Liu, Chien-Hung
Chen, Woei-Kae
You, Shingchern D.
ICT EXPRESS, 2020, 6 (03): : 200 - 203
[49] Control of conventional continuous thickeners via proximal policy optimization
Silva, Jonathan R.
Euzebio, Thiago A. M.
Braga, Marcio F.
MINERALS ENGINEERING, 2024, 214
[50] Automated cloud resources provisioning with the use of the proximal policy optimization
Włodzimierz Funika
Paweł Koperek
Jacek Kitowski
The Journal of Supercomputing, 2023, 79 : 6674 - 6704

← 1 2 3 4 5 →