Improving proximal policy optimization with alpha divergence

被引：3

作者：

Xu, Haotian ^{[1
]}

Yan, Zheng ^{[1
]}

Xuan, Junyu ^{[1
]}

Zhang, Guangquan ^{[1
]}

Lu, Jie ^{[1
]}

机构：

[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia

来源：

NEUROCOMPUTING | 2023年 / 534卷

基金：

澳大利亚研究理事会;

关键词：

Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;

D O I：

10.1016/j.neucom.2023.02.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.

引用

页码：94 / 105

页数：12

共 50 条

[31] Proximal policy optimization with model-based methods
Li, Shuailong
Zhang, Wei
Zhang, Huiwen
Zhang, Xin
Leng, Yuquan
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
[32] HiPPO: Enhancing proximal policy optimization with highlight replay
Zhang, Shutong
Chen, Xing
Liu, Zhaogeng
Chen, Hechang
Chang, Yi
PATTERN RECOGNITION, 2025, 162
[33] Proximal policy optimization via enhanced exploration efficiency
Zhang, Junwei
Zhang, Zhenghao
Han, Shuai
Lue, Shuai
INFORMATION SCIENCES, 2022, 609 : 750 - 765
[34] Use of Proximal Policy Optimization for the Joint Replenishment Problem
Vanvuchelen, Nathalie
Gijsbrechts, Joren
Boute, Robert
COMPUTERS IN INDUSTRY, 2020, 119
[35] An Object Recognition Grasping Approach Using Proximal Policy Optimization With YOLOv5
Zheng, Qingchun
Peng, Zhi
Zhu, Peihao
Zhao, Yangyang
Zhai, Ran
Ma, Wenpeng
IEEE ACCESS, 2023, 11 : 87330 - 87343
[36] A proximal policy optimization approach for food delivery problem with reassignment due to order cancellation
Deng, Yang
Yan, Yimo
Chow, Andy H. F.
Zhou, Zhili
Ying, Cheng-shuo
Kuo, Yong-Hong
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
[37] An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm
Kim, Hye-Young
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (02): : 297 - 305
[38] Intelligent Design of Hairpin Filters Based on Artificial Neural Network and Proximal Policy Optimization
Ye, Yunong
Wu, Yifan
Chen, Jiayu
Su, Guodong
Wang, Junchao
Liu, Jun
APPLIED SCIENCES-BASEL, 2023, 13 (16):
[39] Mapless Navigation with Deep Reinforcement Learning based on The Convolutional Proximal Policy Optimization Network
Toan, Nguyen Duc
Woo, Kim Gon
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 298 - 301
[40] Intelligent Control for a Non-holonomic Constrained Mobile Robot with Proximal Policy Optimization
Xie, Junran
Wang, Qingling
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 2913 - 2918

← 1 2 3 4 5 →