Robotic Learning From Advisory and Adversarial Interactions Using a Soft Wrist

被引：5

作者：

Hamaya, Masashi ^{[1
]}

Tanaka, Kazutoshi ^{[1
]}

Shibata, Yoshiya ^{[2
]}

Von Drigalski, Felix ^{[1
]}

Nakashima, Chisato ^{[2
]}

Ijiri, Yoshihisa ^{[1
]}

机构：

[1] OMRON SINIC X Corp, Bunkyo Ku, Hongo 5-24-5, Tokyo 1130033, Japan

[2] OMRON Corp, Minato Ku, Konan 2-3-13, Tokyo 1080075, Japan

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2021年 / 6卷 / 02期

关键词：

Robots; Task analysis; Human-robot interaction; Robustness; Reinforcement learning; Wrist; Soft robotics; Physical human-robot interaction; reinforcement learning for robotic control; soft robot applications;

D O I：

10.1109/LRA.2021.3067232

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

In this letter, we developed a novel learning framework from physical human-robot interactions. Owing to human domain knowledge, such interactions can be useful for facilitation of learning. However, applying numerous interactions for training data might place a burden on human users, particularly in real-world applications. To address this problem, we propose formulating this as a model-based reinforcement learning problem to reduce errors during training and increase robustness. Our key idea is to develop 1) an advisory and adversarial interaction strategy and 2) a human-robot interaction model to predict each behavior. In the advisory and adversarial interactions, a human guides and disturbs the robot when it moves in the wrong and correct directions, respectively. Meanwhile, the robot tries to achieve its goal in conjunction with predicting the human's behaviors using the interaction model. To verify the proposed method, we conducted peg-in-hole experiments in a simulation and real-robot environment with human participants and a robot, which has an underactuated soft wrist module. The experimental results showed that our proposed method had smaller position errors during training and a higher number of successes than the baselines without any interactions and with random interactions.

引用

页码：3878 / 3885

页数：8

共 34 条

[1] [Anonymous], PROBABILISTIC MODEL
[2] [Anonymous], 2019, 5 INT C LEARN REPR I
[3] [Anonymous], BULLET REAL TIME PHY
[4] Multiobjective Optimization for Stiffness and Position Control in a Soft Robot Arm Module
Ansari, Y.
Manti, M.
Falotico, E.
Cianchetti, M.
Laschi, C.
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (01): : 108 - 115
[5] Bajcsy A., 2017, C ROB LEARN CORL, P217
[6] Behzadan Vahid, 2017, Machine Learning and Data Mining in Pattern Recognition. 13th International Conference, MLDM 2017. Proceedings: LNAI 10358, P262, DOI 10.1007/978-3-319-62416-7_19
[7] Biyik E., 2020, P C ROB LEARN, P1177
[8] Reinforcement learning of motor skills using Policy Search and human corrective advice
Celemin, Carlos
Maeda, Guilherme
Ruiz-del-Solar, Javier
Peters, Jan
Kober, Jens
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2019, 38 (14) : 1560 - 1580
[9] Christiano Paul F, 2017, ADV NEURAL INF PROCE, V30, P4299
[10] Chua K, 2018, ADV NEUR IN, V31

← 1 2 3 4 →