A tutorial introduction to reinforcement learning

被引：3

作者：

Vidyasagar, Mathukumalli ^{[1
]}

机构：

[1] Indian Inst Technol Hyderabad, SERB Natl Sci Chair, Kandi 502284, Telangana, India

来源：

SICE JOURNAL OF CONTROL MEASUREMENT AND SYSTEM INTEGRATION | 2023年 / 16卷 / 01期

关键词：

Markov decision processes; Bellman optimality; value iteration; Q-iteration; stochastic approximation; STOCHASTIC-APPROXIMATION; CONVERGENCE; CRITERION;

D O I：

10.1080/18824889.2023.2196033

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stochastic approximation (SA) as a unifying theme. The scope of the paper includes Markov reward processes, Markov decision processes, SA algorithms, and widely used algorithms such as temporal difference learning and Q-learning.

引用

页码：172 / 191

页数：20

共 42 条

[1]

[Anonymous], 1997, Application of Mathematics

[2] DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].

ARAPOSTATHIS, A ;

BORKAR, VS ;

FERNANDEZGAUCHERAND, E ;

GHOSH, MK ;

MARCUS, SI .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344

[3]

Benaim M., 1999, Dynamics of stochastic approximation algorithms

[4]

BENVENISTE A, 1990, ADAPTIVE ALGORITHMS

[5]

Bertsekas D. P., 1996, Neuro-Dynamic Programming, V1st

[6] MULTIDIMENSIONAL STOCHASTIC APPROXIMATION METHODS [J].

BLUM, JR .

ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (04) :737-744

[7]

Borkar V., 2022, Stochastic Approximation: A Dynamical Systems Viewpoint, V2nd

[8]

Borkar Vivek S., 2009, Stochastic Approximation: A Dynamical Systems Viewpoint, V48

[9] Asynchronous stochastic approximations [J].

Borkar, VS .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1998, 36 (03) :840-851

[10] Stochastic approximation with two time scales [J].

Borkar, VS .

SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294

← 1 2 3 4 5 →