A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

被引：685

作者：

Grondman, Ivo ^{[1
]}

Busoniu, Lucian ^{[2
,3
,4
]}

Lopes, Gabriel A. D. ^{[1
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands

[2] Univ Lorraine, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[3] CNRS, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[4] Tech Univ Cluj Napoca, Dept Automat, Cluj Napoca 400020, Romania

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2012年 / 42卷 / 06期

关键词：

Actor-critic; natural gradient; policy gradient; reinforcement learning (RL); ALGORITHM; COST; APPROXIMATION;

D O I：

10.1109/TSMCC.2012.2218595

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.

引用

页码：1291 / 1307

页数：17

共 85 条

[1]

ALEKSANDROV VM, 1968, ENG CYBERN, P11

[2] Natural gradient works efficiently in learning [J].

Amari, S .

NEURAL COMPUTATION, 1998, 10 (02) :251-276

[3]

Amari S, 1998, INT CONF ACOUST SPEE, P1213, DOI 10.1109/ICASSP.1998.675489

[4]

[Anonymous], 2008, P 25 INT C MACHINE L

[5]

[Anonymous], 2007, DYNAMIC PROGRAMMING

[6]

[Anonymous], 2010, ALGORITHMS REINFORCE

[7]

[Anonymous], 2003, INT JOINT C ART INT

[8]

[Anonymous], ADV NEURAL INFORM PR

[9]

[Anonymous], 166 CUEDFINFENGTR U

[10]

[Anonymous], 2008, Proc. Advances in Neural Information Processing Systems (NIPS)

← 1 2 3 4 5 6 7 8 9 →