Improving proximal policy optimization with alpha divergence

被引:3
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [21] Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance
    Hu D.
    Dong W.
    Xie W.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (01): : 195 - 205
  • [22] Proximal Policy Optimization with Mixed Distributed Training
    Zhang, Zhenyu
    Luo, Xiangfeng
    Liu, Tong
    Xie, Shaorong
    Wang, Jianshu
    Wang, Wei
    Li, Yang
    Peng, Yan
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1452 - 1456
  • [23] Anti-Martingale Proximal Policy Optimization
    Gu, Yang
    Cheng, Yuhu
    Yu, Kun
    Wang, Xuesong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6421 - 6432
  • [24] Tuning Proximal Policy Optimization Algorithm in Maze Solving with ML-Agents
    Hung, Phan Thanh
    Truong, Mac Duy Dan
    Hung, Phan Duy
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 248 - 262
  • [25] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization
    Huang, Chenping
    Cao, Bin
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 396 - 414
  • [26] An Enhanced Proximal Policy Optimization-Based Reinforcement Learning Method with Random Forest for Hyperparameter Optimization
    Ma, Zhixin
    Cui, Shengmin
    Joe, Inwhee
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [27] Proximal policy optimization based hybrid recommender systems for large scale recommendations
    Vaibhav Padhye
    Kailasam Lakshmanan
    Amrita Chaturvedi
    Multimedia Tools and Applications, 2023, 82 : 20079 - 20100
  • [28] The Temperature Prediction of Permanent Magnet Synchronous Machines Based on Proximal Policy Optimization
    Cen, Yuefeng
    Zhang, Chenguang
    Cen, Gang
    Zhang, Yulai
    Zhao, Cheng
    INFORMATION, 2020, 11 (11) : 1 - 13
  • [29] Proximal policy optimization based hybrid recommender systems for large scale recommendations
    Padhye, Vaibhav
    Lakshmanan, Kailasam
    Chaturvedi, Amrita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (13) : 20079 - 20100
  • [30] Pairs Trading Strategy Optimization Using Proximal Policy Optimization Algorithms
    Chen, Yi-Feng
    Shih, Wen-Yueh
    Lai, Hsu -Chao
    Chang, Hao-Chun
    Huang, Jiun-Long
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 40 - 47