Improving proximal policy optimization with alpha divergence

被引:3
作者
Xu, Haotian [1 ]
Yan, Zheng [1 ]
Xuan, Junyu [1 ]
Zhang, Guangquan [1 ]
Lu, Jie [1 ]
机构
[1] Univ Technol Sydney, Australia Artificial Intelligence Inst, Fac Engn & Informat Technol, Ultimo, Australia
基金
澳大利亚研究理事会;
关键词
Reinforcement learning; Deep neural networks; Proximal policy optimization; KL divergence; Alpha divergence; Markov decision process;
D O I
10.1016/j.neucom.2023.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning, which is formu-lated as an unconstrained optimization problem including two terms: accumulative discount return and Kullback-Leibler (KL) divergence. Currently, there are three PPO versions: primary, adaptive, and clip-ping. The most widely used PPO algorithm is the clipping version, in which the KL divergence is replaced by a clipping function to measure the difference between two policies indirectly. In this paper, we revisit this primary PPO and improve it in two aspects. One is to reformulate it as a linearly combined form to control the trade-off between two terms. The other is to substitute a parametric alpha divergence for KL divergence to measure the difference of two policies more effectively. This novel PPO variant is referred to as alphaPPO in this paper. Experiments on six benchmark environments verify the effectiveness of our alphaPPO, compared with clipping and combined PPOs. CO 2023 Published by Elsevier B.V.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [41] On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data
    Hachaj, Tomasz
    Piekarczyk, Marcin
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [42] Reactive Power Optimization Based on Proximal Policy Optimization of Deep Reinforcement Learning
    Zahng P.
    Zhu Z.
    Xie H.
    Dianwang Jishu/Power System Technology, 2023, 47 (02): : 562 - 570
  • [43] A proximal policy optimization with curiosity algorithm for virtual drone navigation
    Das, Rupayan
    Khan, Angshuman
    Paul, Gunjan
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (01):
  • [44] A proximal policy optimization based intelligent home solar management
    Creer, Kode
    Parvez, Imtiaz
    2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, : 463 - 467
  • [45] Proximal policy optimization-based controller for chaotic systems
    Yau, Her-Terng
    Kuo, Ping-Huan
    Luan, Po-Chien
    Tseng, Yung-Ruen
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (01) : 586 - 601
  • [46] A Proximal Policy Optimization method in UAV swarm formation control
    Yu, Ning
    Juan, Feng
    Zhao, Hongwei
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 100 : 268 - 276
  • [47] Decision Planning for Autonomous Driving Based on Proximal Policy Optimization
    Li, Shuang
    Liu, Chunsheng
    Nie, Zhaoying
    PROCEEDINGS OF THE 2024 3RD INTERNATIONAL SYMPOSIUM ON INTELLIGENT UNMANNED SYSTEMS AND ARTIFICIAL INTELLIGENCE, SIUSAI 2024, 2024, : 145 - 148
  • [48] Implementing action mask in proximal policy optimization (PPO) algorithm
    Tang, Cheng-Yen
    Liu, Chien-Hung
    Chen, Woei-Kae
    You, Shingchern D.
    ICT EXPRESS, 2020, 6 (03): : 200 - 203
  • [49] Control of conventional continuous thickeners via proximal policy optimization
    Silva, Jonathan R.
    Euzebio, Thiago A. M.
    Braga, Marcio F.
    MINERALS ENGINEERING, 2024, 214
  • [50] Automated cloud resources provisioning with the use of the proximal policy optimization
    Włodzimierz Funika
    Paweł Koperek
    Jacek Kitowski
    The Journal of Supercomputing, 2023, 79 : 6674 - 6704