Optimizing Reinforcement Learning Control Model in Furuta Pendulum and Transferring it to Real-World

被引：3

作者：

Hong, Myung Rae ^{[1
]}

Kang, Sanghun ^{[1
]}

Lee, Jingoo ^{[2
]}

Seo, Sungchul ^{[3
]}

Han, Seungyong ^{[1
]}

Koh, Je-Sung ^{[1
]}

Kang, Daeshik ^{[1
]}

机构：

[1] Ajou Univ, Dept Mech Engn, Multiscale Bioinspired Technol Lab, Suwon 16499, South Korea

[2] Korea Inst Machinery ad Mat, Dept Sustainable Environm Res, Multiscale Bioinspired Technol Lab, Daejeon 34103, South Korea

[3] Seokyeong Univ, Dept Nanochem Biol & Environm Engn, Seoul 02713, South Korea

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

新加坡国家研究基金会;

关键词：

Furuta pendulum; inverted pendulum problem; reward design; reinforcement learning; Sim2Real;

D O I：

10.1109/ACCESS.2023.3310405

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning does not require explicit robot modeling as it learns on its own based on data, but it has temporal and spatial constraints when transferred to real-world environments. In this research, we trained a balancing Furuta pendulum problem, which is difficult to model, in a virtual environment (Unity) and transferred it to the real world. The challenge of the balancing Furuta pendulum problem is to maintain the pendulum's end effector in a vertical position. We resolved the temporal and spatial constraints by performing reinforcement learning in a virtual environment. Furthermore, we designed a novel reward function that enabled faster and more stable problem-solving compared to the two existing reward functions. We validate each reward function by applying it to the soft actor-critic (SAC) and proximal policy optimization (PPO). The experimental result shows that cosine reward function is trained faster and more stable. Finally, SAC algorithm model using a cosine reward function in the virtual environment is an optimized controller. Additionally, we evaluated the robustness of this model by transferring it to the real environment.

引用

页码：95195 / 95200

页数：6

共 18 条

[1]

[Anonymous], 2002, Non-linear control for underactuated mechanical systems

[2] Nonlinear control with friction compensation to swing-up a Furuta pendulum [J].

Antonio-Cruz, Mayra ;

Hernandez-Guzman, Victor Manuel ;

Merlo-Zapata, Carlos Alejandro ;

Marquez-Sanchez, Celso .

ISA TRANSACTIONS, 2023, 139 :713-723

[3] Inverted pendulum systems: rotary and arm-driven - a mechatronic system design case study [J].

Awtar, S ;

King, N ;

Allen, T ;

Bang, I ;

Hagan, M ;

Skidmore, D ;

Craig, K .

MECHATRONICS, 2002, 12 (02) :357-370

[4] On the Dynamics of the Furuta Pendulum [J].

Cazzolato, Benjamin Seth ;

Prime, Zebb .

JOURNAL OF CONTROL SCIENCE AND ENGINEERING, 2011, 2011

[5]

Chebotar Y, 2019, IEEE INT CONF ROBOT, P8973, DOI [10.1109/icra.2019.8793789, 10.1109/ICRA.2019.8793789]

[6]

Ding H., 2006, P 1 IEEE C IND EL AP, P1

[7]

Furuta K., 2004, Bull. of the Polish Acad. of Sci.: Tech. Sci, V52, P153

[8]

Furuta K., 1992, Proc. Inst. Mech. Engineers, Part I, J. Syst. Control Eng., V206, P263

[9]

Haarnoja T, 2018, Arxiv, DOI arXiv:1801.01290

[10] Reinforcement learning to adaptive control of nonlinear systems [J].

Hwang, KS ;

Tan, SW ;

Tsai, MC .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2003, 33 (03) :514-521

← 1 2 →