Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

被引：0

作者：

Yuan, Weihao ^{[1
,2
,3
]}

Stork, Johannes A. ^{[7
]}

Kragic, Danica ^{[7
]}

Wang, Michael Y. ^{[1
,2
,3
,4
]}

Hang, Kaiyu ^{[1
,2
,5
,6
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] HKUST Robot Inst, Hong Kong, Peoples R China

[3] Dept Elect & Comp Engn, Hong Kong, Peoples R China

[4] Dept Mech & Aerosp Engn, Hong Kong, Peoples R China

[5] Dept Comp Sci & Engn, Hong Kong, Peoples R China

[6] HKUST Inst Adv Study, Hong Kong, Peoples R China

[7] KTH Royal Inst Technol, Ctr Autonomous Syst, Robot Percept & Learning Lab, Stockholm, Sweden

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.

引用

页码：270 / 277

页数：8

共 33 条

[21] Hierarchical Fingertip Space: A Unified Framework for Grasp Planning and In-Hand Grasp Adaptation [J].

Hang, Kaiyu ;

Li, Miao ;

Stork, Johannes A. ;

Bekiroglu, Yasemin ;

Pokorny, Florian T. ;

Billard, Aude ;

Kragic, Danica .

IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (04) :960-972

[22]

Hang KY, 2014, IEEE INT CONF ROBOT, P381, DOI 10.1109/ICRA.2014.6906885

[23]

Hang KY, 2014, IEEE INT C INT ROBOT, P1641, DOI 10.1109/IROS.2014.6942775

[24]

Kingma D.P., 2014, INT C LEARN REP

[25]

Koenig N., 2004, IROS

[26] WHY THERE ARE COMPLEMENTARY LEARNING-SYSTEMS IN THE HIPPOCAMPUS AND NEOCORTEX - INSIGHTS FROM THE SUCCESSES AND FAILURES OF CONNECTIONIST MODELS OF LEARNING AND MEMORY [J].

MCCLELLAND, JL ;

MCNAUGHTON, BL ;

OREILLY, RC .

PSYCHOLOGICAL REVIEW, 1995, 102 (03) :419-457

[27] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[28] Play it again: reactivation of waking experience and memory [J].

O'Neill, Joseph ;

Pleydell-Bouverie, Barty ;

Dupret, David ;

Csicsvari, Jozsef .

TRENDS IN NEUROSCIENCES, 2010, 33 (05) :220-229

[29] Integrating visual perception and manipulation for autonomous learning of object representations [J].

Schiebener, David ;

Morimoto, Jun ;

Asfour, Tamim ;

Ude, Ales .

ADAPTIVE BEHAVIOR, 2013, 21 (05) :328-345

[30] Manipulation planning with probabilistic roadmaps [J].

Siméon, T ;

Laumond, JP ;

Cortés, J ;

Sahbani, A .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2004, 23 (7-8) :729-746

← 1 2 3 4 →