Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

被引:17
作者
Hein, Daniel [1 ]
Hentschel, Alexander [2 ]
Runkler, Thomas A. [2 ]
Udluft, Steffen [2 ]
机构
[1] Tech Univ Munich, Informat, Munich, Germany
[2] Siemens AG, Munich, Germany
关键词
Benchmark; Cart Pole; Continuous Action Space; Continuous State Space; High-dimensional; Model-based; Mountain Car; Particle Swarm Optimization; Reinforcement Learning;
D O I
10.4018/IJSIR.2016070102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.
引用
收藏
页码:23 / 42
页数:20
相关论文
共 28 条
[1]  
Bellmann R., 1962, ADAPTIVE CONTROL PRO
[2]  
Camacho F., 2007, MODEL PREDICTIVE CON, DOI [10.1007/978-0-85729-398-5, DOI 10.1007/978-0-85729-398-5]
[3]  
Diehl M., 2017, MODEL PREDICTIVE CON
[4]  
Eberhart RC, 2000, IEEE C EVOL COMPUTAT, P84, DOI 10.1109/CEC.2000.870279
[5]  
Eberhart RC, 1996, COMPUTATIONAL INTELL
[6]  
Engelbrecht AP., 2005, FUNDAMENTALS COMPUTA
[7]  
FANTONI I, 2002, COMM CONT E, P1
[8]  
Feng HM, 2005, THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, P363
[9]  
Findeisen R., 2007, ASSESSMENT FUTURE DI, V358, DOI [10.1007/978-3-540-72699-9, DOI 10.1007/978-3-540-72699-9]
[10]  
Findeisen R., 2002, 21 BENELUX M SYSTEMS, P1