Opportunities and Challenges in Applying Reinforcement Learning to Robotic Manipulation: an Industrial Case Study

被引：2

作者：

Toner, Tyler ^{[1
,2
]}

Saez, Miguel ^{[2
]}

Tilbury, Dawn M. ^{[1
]}

Barton, Kira ^{[1
]}

机构：

[1] Univ Michigan, 2505 Hayward St, Ann Arbor, MI 48109 USA

[2] Gen Motors, GM Tech Ctr Rd, Warren, MI 48092 USA

来源：

MANUFACTURING LETTERS | 2023年 / 35卷

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; Industrial robotics; Smart manufacturing; Wire harness installation; SERVO CONTROL;

D O I：

10.1016/j.mfglet.2023.08.055

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

As moves towards a more agile paradigm, industrial robots are expected to perform more complex tasks in less structured environments, complicating the use of traditional automation techniques and leading to growing interest in data-driven methods, like reinforcement learning (RL). In this work, we explore the process of applying RL to enable automation of a challenging industrial manipulation task. We focus on wire harness installation as a motivating example, which presents challenges for traditional automation due to the nonlinear dynamics of the deformable harness. A physical system was developed involving a three-terminal harness manipulated by a 6-DOF UR5 robot, with control enabled through a ROS interface. Modifications were made to the harness to enable simplified grasping and marker-based visual tracking. We detail the development of an RL formulation of the problem, subject to practical constraints on control and sensing motivated by the physical system. We develop a simulator and a basic scripted policy with which to safely generate a data-set of high-quality behaviors, then apply a state-of-the-art model-free offline RL algorithm, TD3+BC, to learn a policy to serve as a safe starting point on the physical system. Despite extensive tuning, we find that the algorithm fails to achieve acceptable performance. We propose three failure modalities to explain the learning performance, related to control frequency, task symmetry arising from problem simplifications, and unexpected policy complexity, and discuss opportunities for future applications. (c) 2023 The Authors. Published by ELSEVIER Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd /4.0)

引用

页码：1019 / 1030

页数：12

共 48 条

[1] Optuna: A Next-generation Hyperparameter Optimization Framework
Akiba, Takuya
Sano, Shotaro
Yanase, Toshihiko
Ohta, Takeru
Koyama, Masanori
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2623 - 2631
[2] Andrychowicz M., 2017, Advances in Neural Information Processing Systems, V30
[3] Barto AG, 2003, DISCRETE EVENT DYN S, V13, P41, DOI [10.1023/A:1022140919877, 10.1023/A:1025696116075]
[4] Blanco-Claraco J.L., 2020, Technical Report.
[5] Visual servo control - Part II: Advanced approaches
Chaumette, Francois
Hutchinson, Seth
[J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2007, 14 (01) : 109 - 118
[6] Viosual servo control - Part I: Basic approaches
Chaumette, Francois
Hutchinson, Seth
[J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2006, 13 (04) : 82 - 90
[7] Ratliff ND, 2018, Arxiv, DOI arXiv:1801.02854
[8] Integration of Robotic Vision and Tactile Sensing for Wire-Terminal Insertion Tasks
De Gregorio, Daniele
Zanella, Riccardo
Palli, Gianluca
Pirozzi, Salvatore
Melchiorri, Claudio
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16 (02) : 585 - 598
[9] Fujimoto S., 2021, arXiv:2106.06860, P20132
[10] Fujimoto S, 2018, PR MACH LEARN RES, V80

← 1 2 3 4 5 →