Authentic Boundary Proximal Policy Optimization

被引:28
作者
Cheng, Yuhu [1 ,2 ]
Huang, Longyang [1 ,2 ]
Wang, Xuesong [1 ,2 ]
机构
[1] China Univ Min & Technol, Engn Res Ctr Intelligent Control Underground Spac, Minist Educ, Xuzhou 221116, Jiangsu, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Linear programming; Optimization; Robots; Games; Reinforcement learning; Neural networks; Authentic boundary; penalized point policy difference; proximal policy optimization (PPO); reinforcement learning (RL); rollback clipping; REINFORCEMENT; SYSTEMS;
D O I
10.1109/TCYB.2021.3051456
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
引用
收藏
页码:9428 / 9438
页数:11
相关论文
共 50 条
  • [41] Combustion optimization study of pulverized coal boiler based on proximal policy optimization algorithm
    Wu, Xuecheng
    Zhang, Hongnan
    Chen, Huafeng
    Wang, Shifeng
    Gong, Lingling
    APPLIED THERMAL ENGINEERING, 2024, 254
  • [42] Optimal Control Algorithm for Subway Train Operation by Proximal Policy Optimization
    Chen, Bin
    Gao, Chunhai
    Zhang, Lei
    Chen, Junjie
    Chen, Jun
    Li, Yuyi
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [43] PPO-CMA: PROXIMAL POLICY OPTIMIZATION WITH COVARIANCE MATRIX ADAPTATION
    Hamalainen, Perttu
    Babadi, Amin
    Ma, Xiaoxiao
    Lehtinen, Jaakko
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [44] Proximal policy optimization with adaptive threshold for symmetric relative density ratio
    Kobayashi, Taisuke
    RESULTS IN CONTROL AND OPTIMIZATION, 2023, 10
  • [45] Application of Proximal Policy Optimization for Resource Orchestration in Serverless Edge Computing
    Femminella, Mauro
    Reali, Gianluca
    COMPUTERS, 2024, 13 (09)
  • [46] Evaluation of Proximal Policy Optimization with Extensions in Virtual Environments of Various Complexity
    Rauch, Robert
    Korecko, Stefan
    Gazda, Juraj
    2022 32ND INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2022, : 251 - 255
  • [47] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [48] PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization
    Meng, Yuan
    Kuppannagari, Sanmukh
    Kannan, Rajgopal
    Prasanna, Viktor
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2066 - 2078
  • [49] Proximal policy optimization learning based control of congested freeway traffic
    Mo, Shurong
    Wu, Nailong
    Qi, Jie
    Pan, Anqi
    Feng, Zhiguang
    Yan, Huaicheng
    Wang, Yueying
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2024, 45 (02) : 719 - 736
  • [50] Risk-Based Reserve Scheduling for Active Distribution Networks Based on an Improved Proximal Policy Optimization Algorithm
    Li, Xiaoyu
    Han, Xueshan
    Yang, Ming
    IEEE ACCESS, 2023, 11 : 15211 - 15228