Reinforcement learning algorithms: A brief survey

被引:171
作者
Shakya, Ashish Kumar [1 ]
Pillai, Gopinatha [1 ]
Chakrabarty, Sohom [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Roorkee 247667, Uttaranchal, India
关键词
Reinforcement learning; Stochastic optimal control; Function approximation; Deep Reinforcement Learning (DRL); PARTICLE SWARM OPTIMIZATION; GRAPH NEURAL-NETWORK; DIALOGUE MANAGEMENT; ROBOT NAVIGATION; LEVEL; GAME; GO; ENVIRONMENT; MODEL; FACILITIES;
D O I
10.1016/j.eswa.2023.120495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This review gives a broad overview of RL, covering its fundamental principles, essential methods, and illustrative applications. The authors aim to develop an initial reference point for researchers commencing their research work in RL. In this review, the authors cover some fundamental model-free RL algorithms and pathbreaking function approximation-based deep RL (DRL) algorithms for complex uncertain tasks with continuous action and state spaces, making RL useful in various interdisciplinary fields. This article also provides a brief review of model-based and multi-agent RL approaches. Finally, some promising research directions for RL are briefly presented.
引用
收藏
页数:32
相关论文
共 365 条
[31]  
Bellemare MG, 2017, PR MACH LEARN RES, V70
[32]   The Arcade Learning Environment: An Evaluation Platform for General Agents [J].
Bellemare, Marc G. ;
Naddaf, Yavar ;
Veness, Joel ;
Bowling, Michael .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279
[33]   A MARKOVIAN DECISION PROCESS [J].
BELLMAN, R .
JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684
[34]  
Bellman R., 1958, Information and Control, V3, P228, DOI DOI 10.1016/S0019-9958(58)80003-0
[35]  
Bellman R., 1972, Dynamic Programming
[36]  
Bellman R, 1956, SANKHYA, V16, P221
[37]  
Berner C., 2019, arXiv
[38]  
Bertsekas D., 2012, Dynamic Programming and Optimal Control, VI
[39]  
Bertsekas Dimitri P, 1996, Neuro-Dynamic Programming
[40]  
Bhatnagar S., 2009, Advances in Neural Information Processing Systems, P1204