Improving proximal policy optimization with alpha divergence

被引：3

作者：

Xu, Haotian ^{[1
]}

Yan, Zheng ^{[1
]}

Xuan, Junyu ^{[1
]}

Zhang, Guangquan ^{[1
]}

Lu, Jie ^{[1
]}

机构：

[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia

来源：

NEUROCOMPUTING | 2023年 / 534卷

基金：

澳大利亚研究理事会;

关键词：

Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;

D O I：

10.1016/j.neucom.2023.02.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.

引用

页码：94 / 105

页数：12

共 50 条

[1] Proximal Policy Optimization with Relative Pearson Divergence
Kobayashi, Taisuke
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
[2] Image captioning via proximal policy optimization
Zhang, Le
Zhang, Yanshuo
Zhao, Xin
Zou, Zexiao
IMAGE AND VISION COMPUTING, 2021, 108
[3] Decaying Clipping Range in Proximal Policy Optimization
Farsang, Monika
Szegletes, Luca
IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, : 521 - 525
[4] Fast Proximal Policy Optimization
Zhao, Weiqi
Jiang, Haobo
Xie, Jin
PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 73 - 86
[5] A novel guidance law based on proximal policy optimization
Jiang, Yang
Yu, Jianglong
Li, Qingdong
Ren, Zhang
Done, Xiwang
Hua, Yongzhao
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3364 - 3369
[6] Proximal policy optimization with an integral compensator for quadrotor control
Huan Hu
Qing-ling Wang
Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795
[7] Proximal policy optimization with an integral compensator for quadrotor control
Hu, Huan
Wang, Qing-ling
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
[8] Proximal Policy Optimization with Entropy Regularization
Shen, Yuqing
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383
[9] Authentic Boundary Proximal Policy Optimization
Cheng, Yuhu
Huang, Longyang
Wang, Xuesong
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9428 - 9438
[10] Robust solar sail trajectories using proximal policy optimization
Bianchi, Christian
Niccolai, Lorenzo
Mengali, Giovanni
ACTA ASTRONAUTICA, 2025, 226 : 702 - 715

← 1 2 3 4 5 →