MULTI-STEP ACTOR-CRITIC FRAMEWORK FOR REINFORCEMENT LEARNING IN CONTINUOUS CONTROL

被引：0

作者：

Huang T. ^{[1
]}

Chen G. ^{[2
]}

机构：

[1] Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou

[2] School of Engineering, Westlake University, Hangzhou

来源：

Journal of Applied and Numerical Optimization | 2023年 / 5卷 / 02期

关键词：

Continuous control; Convolutional deterministic policy; Multi-step actor-critic; Reinforcement learning; Temporal difference learning;

D O I：

10.23952/jano.5.2023.2.01

中图分类号：

学科分类号：

摘要：

Continuous control is an important issue in control theory. It controls an agent to take action in continuous spaces for transiting from one state to another until achieving the desired goal. A useful tool for this issue is the reinforcement learning where an optimal policy is learned for the agent by maximizing the cumulative reward of the state transitions. However, most existing reinforcement learning methods consider only the one-step transition and one-step reward in each state. In this case, it is hard to recognize the information hidden in the sequence of the previous states and accurately estimate the cumulative reward. Therefore, these methods cannot learn the optimal policy both fast and effectively for continuous control. To solve this problem, in this paper, we propose a new framework, called Multi-step Actor-critic Framework (MAF) for reinforcement learning. In MAF, the convolutional deterministic policy is used to learn the information hidden in the sequence of the previous states by convolutional neural networks, and then n-step temporal difference learning is used to accurately estimate the cumulative reward by considering the rewards from n-step states. Based on an effective reinforcement learning method, TD3, the implementation of our MAF is in nTD3. The theoretical analysis and experiment illustrate that our nTD3 can learn the policy not only better but also faster than the existing RL methods for continuous control. © 2023 Journal of Applied and Numerical Optimization.

引用

页码：189 / 200

页数：11

共 33 条

[1] Mataric M.J., Reinforcement learning in the multi-robot domain, Robot Colonies, pp. 73-83, (1997)
[2] Mnih V., Kavukcuoglu K., Silver D., Et al., Human-level control through deep reinforcement learning, Nature, 518, pp. 529-533, (2015)
[3] Ota K., Jha D.K., Kanezaki A., Training larger networks for deep reinforcement learning, (2021)
[4] Sallab A.E., Abdou M., Perot E., Et al., Deep reinforcement learning framework for autonomous driving, Electronic Imaging, 19, pp. 70-76, (2017)
[5] Michie D., Spiegelhalter D.J., Taylor C.C., Machine learning, Neural and Statistical Classification, (1994)
[6] Murphy K.P., Machine Learning: A Probabilistic Perspective, (2012)
[7] Perrusquia A., Yu W., Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview, Neurocomputing, 438, pp. 145-154, (2021)
[8] Littman M.L., Markov games as a framework for multi-agent reinforcement learning, Machine Learning Proceedings 1994, pp. 157-163, (1994)
[9] Seel N.M., Goal-Directed Learning, Encyclopedia of the Sciences of Learning, (2012)
[10] Sutton R.S, Barto A.G., Reinforcement learning: An introduction, (2018)

← 1 2 3 4 →