Predictive feature selection for genetic policy search

被引:3
作者
Loscalzo, Steven [1 ]
Wright, Robert [2 ]
Yu, Lei [2 ]
机构
[1] AFRL Informat Directorate, Rome, NY 13441 USA
[2] SUNY Binghamton, Binghamton, NY 13902 USA
关键词
Genetic policy search; Feature selection; Dimensionality reduction; Reinforcement learning; REINFORCEMENT; CLASSIFICATION; EXPLORATION; STATE;
D O I
10.1007/s10458-014-9268-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic learning of control policies is becoming increasingly important to allow autonomous agents to operate alongside, or in place of, humans in dangerous and fast-paced situations. Reinforcement learning (RL), including genetic policy search algorithms, comprise a promising technology area capable of learning such control policies. Unfortunately, RL techniques can take prohibitively long to learn a sufficiently good control policy in environments described by many sensors (features). We argue that in many cases only a subset of available features are needed to learn the task at hand, since others may represent irrelevant or redundant information. In this work, we propose a predictive feature selection framework that analyzes data obtained during execution of a genetic policy search algorithm to identify relevant features on-line. This serves to constrain the policy search space and reduces the time needed to locate a sufficiently good policy by embedding feature selection into the process of learning a control policy. We explore this framework through an instantiation called predictive feature selection embedded in neuroevolution of augmenting topology (NEAT), or PFS-NEAT. In an empirical study, we demonstrate that PFS-NEAT is capable of enabling NEAT to successfully find good control policies in two benchmark environments, and show that it can outperform three competing feature selection algorithms, FS-NEAT, FD-NEAT, and SAFS-NEAT, in several variants of these environments.
引用
收藏
页码:754 / 786
页数:33
相关论文
共 57 条
[11]  
Castelletti A., 2011, Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on, P62
[12]  
Cliff D, 1995, LECT NOTES ARTIF INT, V929, P200
[13]  
Deisenroth M.P., 2011, P 28 INT C MACHINE L, P465
[14]  
Devijver P. A., 1982, Pattern Recognition: A Statistical Approach
[15]  
Dietterich T. G., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P118
[16]  
Diuk Carlos, 2009, P 26 ANN INT C MACH, P249
[17]  
Doroodgar Barzin, 2010, 2010 IEEE International Conference on Automation Science and Engineering (CASE 2010), P948, DOI 10.1109/COASE.2010.5584599
[18]  
Ernst D, 2005, J MACH LEARN RES, V6, P503
[19]  
Goldberg D. E., 1987, Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms, P41
[20]   Incremental evolution of complex general behavior [J].
Gomez, F ;
Miikkulainen, R .
ADAPTIVE BEHAVIOR, 1997, 5 (3-4) :317-342