Alleviating Credit Assignment Problem Using Deep Representation Learning with Application to Push Recovery Learning

被引：0

作者：

Davari, Mohammadjavad ^{[1
]}

Alipour, Khalil ^{[1
]}

Hadi, Alireza ^{[1
]}

机构：

[1] Univ Tehran, Dept Mechatron Engn, Fac New Sci & Technol, Tehran, Iran

来源：

2017 ARTIFICIAL INTELLIGENCE AND ROBOTICS (IRANOPEN) | 2017年

关键词：

Deep learning; push recovery; credit assignment problem; latent variable; rewarding system;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

we propose two new methods to accelerate the learning of a task using Q-learning algorithm. We focus specifically on learning of a task, which has the Credit Assignment (CA) problem. A Reinforcement Algorithm (RL) agent is performing this task in high dimensional state-space. The main idea of this paper is to use latent variables that deep autoencoders provide, to make a better rewarding system. We show that using these new rewards speeds up learning of the task in the similar circumstances. The task chosen for the algorithm is Push Recovery (PR) in a simulated environment.

引用

页码：109 / 114

页数：6

共 12 条

[1]

Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI DOI 10.1145/1015330.1015430

[2] Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation [J].

Abbeel, Pieter ;

Dolgov, Drnitri ;

Ng, Andrew Y. ;

Thrun, Sebastian .

2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, :1083-1090

[3] Apprenticeship Learning for Helicopter Control [J].

Coates, Adam ;

Abbeel, Pieter ;

Ng, Andrew Y. .

COMMUNICATIONS OF THE ACM, 2009, 52 (07) :97-105

[4]

Kingma D.P., 2013, ARXIV13126114

[5]

Kolter J.Zico., 2007, NIPS

[6]

Lillicrap T. P., 2015, 4 INT C LEARN REPR I, DOI [10.48550/arXiv.1509.02971, DOI 10.48550/ARXIV.1509.02971]

[7]

Ng A. Y., 2000, ICML

[8] Reinforcement learning of motor skills with policy gradients [J].

Peters, Jan ;

Schaal, Stefan .

NEURAL NETWORKS, 2008, 21 (04) :682-697

[9]

Silver D, 2014, PR MACH LEARN RES, V32

[10] Humanoid Push Recovery [J].

Stephens, Benjamin .

HUMANOIDS: 2007 7TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, 2007, :589-595

← 1 2 →