Improving proximal policy optimization with alpha divergence

被引:3
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [31] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [32] HiPPO: Enhancing proximal policy optimization with highlight replay
    Zhang, Shutong
    Chen, Xing
    Liu, Zhaogeng
    Chen, Hechang
    Chang, Yi
    PATTERN RECOGNITION, 2025, 162
  • [33] Proximal policy optimization via enhanced exploration efficiency
    Zhang, Junwei
    Zhang, Zhenghao
    Han, Shuai
    Lue, Shuai
    INFORMATION SCIENCES, 2022, 609 : 750 - 765
  • [34] Use of Proximal Policy Optimization for the Joint Replenishment Problem
    Vanvuchelen, Nathalie
    Gijsbrechts, Joren
    Boute, Robert
    COMPUTERS IN INDUSTRY, 2020, 119
  • [35] An Object Recognition Grasping Approach Using Proximal Policy Optimization With YOLOv5
    Zheng, Qingchun
    Peng, Zhi
    Zhu, Peihao
    Zhao, Yangyang
    Zhai, Ran
    Ma, Wenpeng
    IEEE ACCESS, 2023, 11 : 87330 - 87343
  • [36] A proximal policy optimization approach for food delivery problem with reassignment due to order cancellation
    Deng, Yang
    Yan, Yimo
    Chow, Andy H. F.
    Zhou, Zhili
    Ying, Cheng-shuo
    Kuo, Yong-Hong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [37] An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm
    Kim, Hye-Young
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (02): : 297 - 305
  • [38] Intelligent Design of Hairpin Filters Based on Artificial Neural Network and Proximal Policy Optimization
    Ye, Yunong
    Wu, Yifan
    Chen, Jiayu
    Su, Guodong
    Wang, Junchao
    Liu, Jun
    APPLIED SCIENCES-BASEL, 2023, 13 (16):
  • [39] Mapless Navigation with Deep Reinforcement Learning based on The Convolutional Proximal Policy Optimization Network
    Toan, Nguyen Duc
    Woo, Kim Gon
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 298 - 301
  • [40] Intelligent Control for a Non-holonomic Constrained Mobile Robot with Proximal Policy Optimization
    Xie, Junran
    Wang, Qingling
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 2913 - 2918