Algorithmic Foundations of Reinforcement Learning

被引:0
作者
Pareigis, Stephan [1 ]
机构
[1] Hamburg Univ Appl Sci, Dept Informat, Berliner Tor 7, D-20099 Hamburg, Germany
来源
ADVANCES IN REAL-TIME AND AUTONOMOUS SYSTEMS, 2023 | 2024年 / 1009卷
关键词
reinforcement learning; MDP; markov-decision process; dynamic programming; deep reinforcement learning; SARSA; Q-learning; DQN; REINFORCE; A2C; PPO; DDPG; SAC; policy gradient methods; exploration vs exploitation; sparse rewards; robotics; offline reinforcement learning;
D O I
10.1007/978-3-031-61418-7_1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A comprehensive algorithmic introduction to reinforcement learning is given, laying the foundational concepts and methodologies. Fundamentals of Markov Decision Processes (MDPs) and dynamic programming are covered, describing the principles and techniques for addressing model-based problems within MDP frameworks. The most significant model-free reinforcement learning algorithms, including Q-learning and actor-critic methods are explained in detail. A comprehensive overview of each algorithm's mechanisms is provided, forming a robust algorithmic and mathematical understanding of current practices in reinforcement learning.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 9 条
[1]  
Graesser L., 2019, Foundations of Deep Reinforcement Learning: Theory and Practice in Python
[2]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[3]  
Lapan M., 2020, Deep reinforcement learning hands-on, V2
[4]  
Lillicrap T. P., 2016, 4 INT C LEARN REPRES
[5]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[6]  
Palmas A., 2020, The Reinforcement Learning Workshop
[7]  
Schulman J, 2018, Arxiv, DOI [arXiv:1506.02438, 10.48550/arXiv.1506.02438]
[8]  
Schulman J, 2017, Arxiv, DOI arXiv:1707.06347
[9]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1