Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引:0
|
作者
Gurumurthy, Swaminathan [1 ]
Manchester, Zachary [1 ]
Kolter, J. Zico [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Bosch Ctr AI, Sunnyvale, CA USA
来源
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211 | 2023年 / 211卷
关键词
Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Reinforcement learning for a biped robot based on a CPG-actor-critic method
    Nakamura, Yutaka
    Mori, Takeshi
    Sato, Masa-Aki
    Ishii, Shin
    NEURAL NETWORKS, 2007, 20 (06) : 723 - 735
  • [22] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
    Han, Minghao
    Zhang, Lixian
    Wang, Jun
    Pan, Wei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
  • [23] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
    Baheri, Betis
    Tronge, Jacob
    Fang, Bo
    Li, Ang
    Chaudhary, Vipin
    Guan, Qiang
    2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
  • [24] Automated State Feature Learning for Actor-Critic Reinforcement Learning through NEAT
    Peng, Yiming
    Chen, Gang
    Holdaway, Scott
    Mei, Yi
    Zhang, Mengjie
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCO'17 COMPANION), 2017, : 135 - 136
  • [25] Lexicographic Actor-Critic Deep Reinforcement Learning for Urban Autonomous Driving
    Zhang, Hengrui
    Lin, Youfang
    Han, Sheng
    Lv, Kai
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) : 4308 - 4319
  • [26] Actor-Critic Reinforcement Learning for Automatic Left Atrial Appendage Segmentation
    Abdullah, Al Walid
    Yun, Il Dong
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 609 - 612
  • [27] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Harshat Kumar
    Alec Koppel
    Alejandro Ribeiro
    Machine Learning, 2023, 112 : 2433 - 2467
  • [28] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Kumar, Harshat
    Koppel, Alec
    Ribeiro, Alejandro
    MACHINE LEARNING, 2023, 112 (07) : 2433 - 2467
  • [29] An Actor-Critic Reinforcement Learning Approach for Energy Harvesting Communications Systems
    Masadeh, Ala'eddin
    Wang, Zhengdao
    Kamal, Ahmed E.
    2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2019,
  • [30] Adaptive Assist-as-needed Control Based on Actor-Critic Reinforcement Learning
    Zhang, Yufeng
    Li, Shuai
    Nolan, Karen J.
    Zanotto, Damiano
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4066 - 4071