Episode-Fuzzy-COACH Method for Fast Robot Skill Learning

被引：4

作者：

Li, Bingqian ^{[1
,2
]}

Liu, Xing ^{[1
,2
]}

Liu, Zhengxiong ^{[1
,2
]}

Huang, Panfeng ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Res Ctr Intelligent Robot, Sch Astronaut, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Natl Key Lab Aerosp Flight Dynam, Xian 710072, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS | 2024年 / 71卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Hybrid intelligence; interactive reinforcement learning; robot skill learning;

D O I：

10.1109/TIE.2023.3294600

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To realize robot skill learning in the real world, reinforcement learning algorithms need to be applied in continuous problems with high sample efficiency. Hybrid intelligence is regarded as an available solution for this problem, due to the ability to speed up the learning process with human knowledge and experience. Therefore, we propose Episode-Fuzzy-COACH (COrrective Advice Communicated by Humans), to imitate human fuzzy logic and involve human intelligence in the learning process. In this framework, human knowledge and experience are involved in the learning process, which are provided by human feedback and fuzzy rules designed by human users. Moreover, it is combined with Path Integrals Policy Improvement ( PI2 ), to realize hybrid intelligence, which is used to realize fast robot skill learning. Throwing Movement Primitives proposed in this article is used to represent the policy of ball-throwing skill. According to the simulation results, the learning efficiency of our method is increased by 72% and 42.86%, respectively, compared with pure PI2 and PI2+ COACH. Our method validated in experiments is 46.67% more effective than PI2+ COACH. The results also show that the performance of our method is not affected by users' knowledge level of the related field. It is proven that PI2+ Episode-Fuzzy-COACH is available for fast robot skill learning.

引用

页码：5931 / 5940

页数：10

共 27 条

[1]

Bignold A., 2020, J. Ambient Intell Humaniz Comput., V14, P3621

[2] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences [J].

Biyik, Erdem ;

Losey, Dylan P. ;

Palan, Malayandi ;

Landolfi, Nicholas C. ;

Shevchuk, Gleb ;

Sadigh, Dorsa .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01) :45-67

[3] Reinforcement learning of motor skills using Policy Search and human corrective advice [J].

Celemin, Carlos ;

Maeda, Guilherme ;

Ruiz-del-Solar, Javier ;

Peters, Jan ;

Kober, Jens .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2019, 38 (14) :1560-1580

[4] An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback [J].

Celemin, Carlos ;

Ruiz-del-Solar, Javier .

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2019, 95 (01) :77-97

[5]

Celemin C, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P581, DOI 10.1109/ICAR.2015.7251514

[6]

Celemin J., 2015, Interactive Learning of ContinuousActions From Corrective Advice Communicated by Humans

[7]

Deisenroth Marc Peter., 2013, Now Foundations and Trends in Robotics, V2, P388

[8] Hybrid Intelligence [J].

Dellermann, Dominik ;

Ebel, Philipp ;

Soellner, Matthias ;

Leimeister, Jan Marco .

BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2019, 61 (05) :637-643

[9]

Ijspeert AukeJ., 2002, ADV NEURAL INFORM PR, P1523

[10] Hierarchical control of soft manipulators towards unstructured interactions [J].

Jiang, Hao ;

Wang, Zhanchi ;

Jin, Yusong ;

Chen, Xiaotong ;

Li, Peijin ;

Gan, Yinghao ;

Lin, Sen ;

Chen, Xiaoping .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2021, 40 (01) :411-434

← 1 2 3 →