Real-Sim-Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning

被引：25

作者：

Liu, Naijun ^{[1
,2
]}

Cai, Yinghao ^{[1
]}

Lu, Tao ^{[1
]}

Wang, Rui ^{[1
,3
]}

Wang, Shuo ^{[1
,2
,4
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100190, Peoples R China

[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Robot & Intelligent Syst, Shenzhen 518055, Peoples R China

[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 05期

基金：

中国国家自然科学基金;

关键词：

robot; policy learning; reality gap; simulated environment; deep reinforcement learning; DOMAIN ADAPTATION;

D O I：

10.3390/app10051555

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Compared to traditional data-driven learning methods, recently developed deep reinforcement learning (DRL) approaches can be employed to train robot agents to obtain control policies with appealing performance. However, learning control policies for real-world robots through DRL is costly and cumbersome. A promising alternative is to train policies in simulated environments and transfer the learned policies to real-world scenarios. Unfortunately, due to the reality gap between simulated and real-world environments, the policies learned in simulated environments often cannot be generalized well to the real world. Bridging the reality gap is still a challenging problem. In this paper, we propose a novel real-sim-real (RSR) transfer method that includes a real-to-sim training phase and a sim-to-real inference phase. In the real-to-sim training phase, a task-relevant simulated environment is constructed based on semantic information of the real-world scenario and coordinate transformation, and then a policy is trained with the DRL method in the built simulated environment. In the sim-to-real inference phase, the learned policy is directly applied to control the robot in real-world scenarios without any real-world data. Experimental results in two different robot control tasks show that the proposed RSR method can train skill policies with high generalization performance and significantly low training costs.

引用

页数：16

共 50 条

[1] Reinforcement learning in a rule-based navigator for robotic manipulators [J].

Althoefer, K ;

Krekelberg, B ;

Husmeier, D ;

Seneviratne, L .

NEUROCOMPUTING, 2001, 37 :51-70

[2]

[Anonymous], ARXIV170905746

[3]

[Anonymous], P ROB SCI SYST 14 RO

[4]

[Anonymous], P AUSTR C ROB AUT SY

[5]

[Anonymous], ARXIV160903759

[6]

[Anonymous], 2018, PROC AAAI C ARTIF IN

[7]

[Anonymous], 2017, ARXIV170309312

[8]

[Anonymous], IEEE T PATTERN ANAL

[9]

[Anonymous], P ROB SCI SYST 13 RO

[10]

[Anonymous], ARXIV170706347

← 1 2 3 4 5 →