A tutorial introduction to reinforcement learning

被引:3
作者
Vidyasagar, Mathukumalli [1 ]
机构
[1] Indian Inst Technol Hyderabad, SERB Natl Sci Chair, Kandi 502284, Telangana, India
关键词
Markov decision processes; Bellman optimality; value iteration; Q-iteration; stochastic approximation; STOCHASTIC-APPROXIMATION; CONVERGENCE; CRITERION;
D O I
10.1080/18824889.2023.2196033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stochastic approximation (SA) as a unifying theme. The scope of the paper includes Markov reward processes, Markov decision processes, SA algorithms, and widely used algorithms such as temporal difference learning and Q-learning.
引用
收藏
页码:172 / 191
页数:20
相关论文
共 42 条
[1]  
[Anonymous], 1997, Application of Mathematics
[2]   DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].
ARAPOSTATHIS, A ;
BORKAR, VS ;
FERNANDEZGAUCHERAND, E ;
GHOSH, MK ;
MARCUS, SI .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344
[3]  
Benaim M., 1999, Dynamics of stochastic approximation algorithms
[4]  
BENVENISTE A, 1990, ADAPTIVE ALGORITHMS
[5]  
Bertsekas D. P., 1996, Neuro-Dynamic Programming, V1st
[6]   MULTIDIMENSIONAL STOCHASTIC APPROXIMATION METHODS [J].
BLUM, JR .
ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (04) :737-744
[7]  
Borkar V., 2022, Stochastic Approximation: A Dynamical Systems Viewpoint, V2nd
[8]  
Borkar Vivek S., 2009, Stochastic Approximation: A Dynamical Systems Viewpoint, V48
[9]   Asynchronous stochastic approximations [J].
Borkar, VS .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1998, 36 (03) :840-851
[10]   Stochastic approximation with two time scales [J].
Borkar, VS .
SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294