Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

被引：17

作者：

Hein, Daniel ^{[1
]}

Hentschel, Alexander ^{[2
]}

Runkler, Thomas A. ^{[2
]}

Udluft, Steffen ^{[2
]}

机构：

[1] Tech Univ Munich, Informat, Munich, Germany

[2] Siemens AG, Munich, Germany

来源：

INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH | 2016年 / 7卷 / 03期

关键词：

Benchmark; Cart Pole; Continuous Action Space; Continuous State Space; High-dimensional; Model-based; Mountain Car; Particle Swarm Optimization; Reinforcement Learning;

D O I：

10.4018/IJSIR.2016070102

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

引用

页码：23 / 42

页数：20

共 28 条

[1]

Bellmann R., 1962, ADAPTIVE CONTROL PRO

[2]

Camacho F., 2007, MODEL PREDICTIVE CON, DOI [10.1007/978-0-85729-398-5, DOI 10.1007/978-0-85729-398-5]

[3]

Diehl M., 2017, MODEL PREDICTIVE CON

[4]

Eberhart RC, 2000, IEEE C EVOL COMPUTAT, P84, DOI 10.1109/CEC.2000.870279

[5]

Eberhart RC, 1996, COMPUTATIONAL INTELL

[6]

Engelbrecht AP., 2005, FUNDAMENTALS COMPUTA

[7]

FANTONI I, 2002, COMM CONT E, P1

[8]

Feng HM, 2005, THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, P363

[9]

Findeisen R., 2007, ASSESSMENT FUTURE DI, V358, DOI [10.1007/978-3-540-72699-9, DOI 10.1007/978-3-540-72699-9]

[10]

Findeisen R., 2002, 21 BENELUX M SYSTEMS, P1

← 1 2 3 →