Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

被引:0
|
作者
Gurumurthy, Swaminathan [1 ]
Manchester, Zachary [1 ]
Kolter, J. Zico [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Bosch Ctr AI, Sunnyvale, CA USA
来源
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211 | 2023年 / 211卷
关键词
Reinforcement Learning; Actor Critic; Continuous control; Highly parallel Environments;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at learning policies for continuous control robotics tasks. They are highly parallelizable and hence have benefited tremendously from the recent rise in GPU based parallel simulators. The most widely used on-policy reinforcement learning algorithm is proximal policy optimization (PPO) which was introduced in 2017 and was designed for a somewhat different setting with CPU based serial or less parallelizable simulators. However, suprisingly, it has maintained dominance even on tasks based on the highly parallelizable simulators of today. In this paper, we show that a different class of on-policy algorithms based on estimating the policy gradient using the critic-action gradients are better suited when using highly parallelizable simulators. The primary issues for these algorithms arise from the lack of diversity of the on-policy experiences used for the updates and the instabilities arising from the interaction between the biased critic gradients and the rapidly changing policy distribution. We address the former by simply increasing the number of parallel simulation runs (thanks to the GPU based simulators) along with an appropriate schedule on the policy entropy to ensure diversity of samples. We address the latter by adding a policy averaging step and value averaging step (as in off-policy methods). With these modifications, we observe that the critic gradient based on-policy method (CGAC) consistently achieves higher episode returns compared with existing baselines. Furthermore, in environments with high dimensional action space, CGAC also trains much faster (in wall-clock time) than the corresponding baselines.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Exploring Policy Diversity in Parallel Actor-Critic Learning
    Zhang, Yanqiang
    Zhai, Yuanzhao
    Zhou, Gongqian
    Ding, Bo
    Feng, Dawei
    Liu, Songwang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1196 - 1203
  • [32] Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic Approach
    Fares, Ahmad
    Bani Younes, Ahmad
    APPLIED SCIENCES-BASEL, 2020, 10 (22): : 1 - 13
  • [33] Consolidated Actor Critic Reinforcement Learning Model Applied to Face Detection
    Goodrich, Ben
    Arel, Itamar
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [34] Recurrent Soft Actor Critic Reinforcement Learning for Demand Response Problems
    Ludolfinger, Ulrich
    Zinsmeister, Daniel
    Peric, Vedran S.
    Hamacher, Thomas
    Hauke, Sascha
    Martens, Maren
    2023 IEEE BELGRADE POWERTECH, 2023,
  • [35] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [36] Development and Validation of Active Roll Control based on Actor-critic Neural Network Reinforcement Learning
    Bahr, Matthias
    Reicherts, Sebastian
    Sieberg, Philipp
    Morss, Luca
    Schramm, Dieter
    SIMULTECH: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON SIMULATION AND MODELING METHODOLOGIES, TECHNOLOGIES AND APPLICATIONS, 2019, 2019, : 36 - 46
  • [37] Segmented Actor-Critic-Advantage Architecture for Reinforcement Learning Tasks
    Kaloev, Martin
    Krastev, Georgi
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2022, 11 (01): : 219 - 224
  • [38] Q-LEARNING, POLICY ITERATION AND ACTOR-CRITIC REINFORCEMENT LEARNING COMBINED WITH METAHEURISTIC ALGORITHMS IN SERVO SYSTEM CONTROL
    Zamfirache, Iuliu Alexandru
    Precup, Radu-Emil
    Petriu, Emil M.
    FACTA UNIVERSITATIS-SERIES MECHANICAL ENGINEERING, 2023, 21 (04) : 615 - 630
  • [39] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [40] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
    Zhao, Juan
    Zhu, Tong
    Xiao, Shuo
    Gao, Zongqian
    Sun, Hao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)