Sparse Latent Space Policy Search

被引：0

作者：

Luck, Kevin Sebastian ^{[1
]}

Pajarinen, Joni ^{[2
]}

Berger, Erik ^{[3
]}

Kyrki, Ville ^{[2
]}

Ben Amor, Heni ^{[1
]}

机构：

[1] Arizona State Univ, Interact Robot Lab, Tempe, AZ 85281 USA

[2] Aalto Univ, Intelligent Robot Grp, Espoo 02150, Finland

[3] Tech Univ Bergakad Freiberg, Inst Comp Sci, D-09599 Freiberg, Germany

来源：

THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2016年

基金：

芬兰科学院;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Computational agents often need to learn policies that involve many control variables, e.g., a robot needs to control several joints simultaneously. Learning a policy with a high number of parameters, however, usually requires a large number of training samples. We introduce a reinforcement learning method for sample-efficient policy search that exploits correlations between control variables. Such correlations are particularly frequent in motor skill learning tasks. The introduced method uses Variational Inference to estimate policy parameters, while at the same time uncovering a low-dimensional latent space of controls. Prior knowledge about the task and the structure of the learning agent can be provided by specifying groups of potentially correlated parameters. This information is then used to impose sparsity constraints on the mapping between the high-dimensional space of controls and a lower-dimensional latent space. In experiments with a simulated bi-manual manipulator, the new approach effectively identifies synergies between joints, performs efficient low-dimensional policy search, and outperforms state-of-the-art policy search methods.

引用

页码：1911 / 1918

页数：8

共 22 条

[1]

[Anonymous], J NEUROSCIENCE

[2]

[Anonymous], 2008, ARXIV08073223

[3]

[Anonymous], 2011, Proceedings of the 28th international conference on machine learning icml-11

[4]

[Anonymous], 2009, INT C MACH LEARN

[5]

[Anonymous], 2009, Advances in Neural Information Processing Systems (NIPS)

[6]

BERNSTEIN N., 1967

[7]

Bishop C., 2006, Pattern recognition and machine learning, P423

[8]

Bitzer S., 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), P3219, DOI 10.1109/IROS.2010.5650243

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Harman H. H., 1976, MODERN FACTOR ANAL, V3rd

← 1 2 3 →