Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引:0
作者
Gurumurthy, Swaminathan [1 ]
Manchester, Zachary [1 ]
Kolter, J. Zico [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Bosch Ctr AI, Sunnyvale, CA USA
来源
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211 | 2023年 / 211卷
关键词
Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16) : 10335 - 10349
  • [42] MULTI-STEP ACTOR-CRITIC FRAMEWORK FOR REINFORCEMENT LEARNING IN CONTINUOUS CONTROL
    Huang T.
    Chen G.
    Journal of Applied and Numerical Optimization, 2023, 5 (02): : 189 - 200
  • [43] Actor Critic-based Multi Objective Reinforcement Learning for Multi Access Edge Computing
    Khot, Vishal
    Vallisha, M.
    Pai, Sharan S.
    Shekar, Chandra R. K.
    Kayarvizhy, N.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (02) : 382 - 389
  • [44] Actor-critic learning based PID control for robotic manipulators
    Nohooji, Hamed Rahimi
    Zaraki, Abolfazl
    Voos, Holger
    APPLIED SOFT COMPUTING, 2024, 151
  • [45] Dynamic Pricing Based on Demand Response Using Actor-Critic Agent Reinforcement Learning
    Ismail, Ahmed
    Baysal, Mustafa
    ENERGIES, 2023, 16 (14)
  • [46] Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning
    Katahira K.
    Kimura K.
    Computational Brain & Behavior, 2023, 6 (2) : 172 - 194
  • [47] Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning
    Wang, Zhuo
    Zhang, Shiwei
    Feng, Xiaoning
    Sui, Yancheng
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING, 2021, 235 (10) : 1787 - 1796
  • [48] Proactive Content Caching Based on Actor-Critic Reinforcement Learning for Mobile Edge Networks
    Jiang, Wei
    Feng, Daquan
    Sun, Yao
    Feng, Gang
    Wang, Zhenzhong
    Xia, Xiang-Gen
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 1239 - 1252
  • [49] DAG-based workflows scheduling using Actor-Critic Deep Reinforcement Learning
    Koslovski, Guilherme Piegas
    Pereira, Kleiton
    Albuquerque, Paulo Roberto
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 150 : 354 - 363
  • [50] Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
    Chen, Zhi
    Chen, Lu
    Liu, Xiaoyuan
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2400 - 2411