Authentic Boundary Proximal Policy Optimization

被引:28
作者
Cheng, Yuhu [1 ,2 ]
Huang, Longyang [1 ,2 ]
Wang, Xuesong [1 ,2 ]
机构
[1] China Univ Min & Technol, Engn Res Ctr Intelligent Control Underground Spac, Minist Educ, Xuzhou 221116, Jiangsu, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Linear programming; Optimization; Robots; Games; Reinforcement learning; Neural networks; Authentic boundary; penalized point policy difference; proximal policy optimization (PPO); reinforcement learning (RL); rollback clipping; REINFORCEMENT; SYSTEMS;
D O I
10.1109/TCYB.2021.3051456
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.
引用
收藏
页码:9428 / 9438
页数:11
相关论文
共 50 条
  • [1] Proximal Policy Optimization With Policy Feedback
    Gu, Yang
    Cheng, Yuhu
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (07): : 4600 - 4610
  • [2] Partial Advantage Estimator for Proximal Policy Optimization
    Jin, Yizhao
    Song, Xiulei
    Slabaugh, Gregory
    Lucas, Simon
    IEEE TRANSACTIONS ON GAMES, 2025, 17 (01) : 158 - 166
  • [3] Anti-Martingale Proximal Policy Optimization
    Gu, Yang
    Cheng, Yuhu
    Yu, Kun
    Wang, Xuesong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6421 - 6432
  • [4] Automatic Management of Cloud Applications with Use of Proximal Policy Optimization
    Funika, Wlodzimierz
    Koperek, Pawel
    Kitowski, Jacek
    COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 73 - 87
  • [5] Automated cloud resources provisioning with the use of the proximal policy optimization
    Funika, Wlodzimierz
    Koperek, Pawel
    Kitowski, Jacek
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (06) : 6674 - 6704
  • [6] Optimal Policy Characterization Enhanced Proximal Policy Optimization for Multitask Scheduling in Cloud Computing
    Jin, Jiangliang
    Xu, Yunjian
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (09) : 6418 - 6433
  • [7] Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization
    Guan, Yang
    Ren, Yangang
    Li, Shengbo Eben
    Sun, Qi
    Luo, Laiquan
    Li, Keqiang
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) : 12597 - 12608
  • [8] Proximal Parameter Distribution Optimization
    Wang, Xuesong
    Li, Tianyi
    Cheng, Yuhu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (06): : 3771 - 3780
  • [9] Hierarchical Landmark Policy Optimization for Visual Indoor Navigation
    Staroverov, Aleksei
    Panov, Aleksandr, I
    IEEE ACCESS, 2022, 10 : 70447 - 70455
  • [10] Proximal Policy Optimization with Entropy Regularization
    Shen, Yuqing
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383