Optimizing Reinforcement Learning Control Model in Furuta Pendulum and Transferring it to Real-World

被引:3
作者
Hong, Myung Rae [1 ]
Kang, Sanghun [1 ]
Lee, Jingoo [2 ]
Seo, Sungchul [3 ]
Han, Seungyong [1 ]
Koh, Je-Sung [1 ]
Kang, Daeshik [1 ]
机构
[1] Ajou Univ, Dept Mech Engn, Multiscale Bioinspired Technol Lab, Suwon 16499, South Korea
[2] Korea Inst Machinery ad Mat, Dept Sustainable Environm Res, Multiscale Bioinspired Technol Lab, Daejeon 34103, South Korea
[3] Seokyeong Univ, Dept Nanochem Biol & Environm Engn, Seoul 02713, South Korea
基金
新加坡国家研究基金会;
关键词
Furuta pendulum; inverted pendulum problem; reward design; reinforcement learning; Sim2Real;
D O I
10.1109/ACCESS.2023.3310405
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning does not require explicit robot modeling as it learns on its own based on data, but it has temporal and spatial constraints when transferred to real-world environments. In this research, we trained a balancing Furuta pendulum problem, which is difficult to model, in a virtual environment (Unity) and transferred it to the real world. The challenge of the balancing Furuta pendulum problem is to maintain the pendulum's end effector in a vertical position. We resolved the temporal and spatial constraints by performing reinforcement learning in a virtual environment. Furthermore, we designed a novel reward function that enabled faster and more stable problem-solving compared to the two existing reward functions. We validate each reward function by applying it to the soft actor-critic (SAC) and proximal policy optimization (PPO). The experimental result shows that cosine reward function is trained faster and more stable. Finally, SAC algorithm model using a cosine reward function in the virtual environment is an optimized controller. Additionally, we evaluated the robustness of this model by transferring it to the real environment.
引用
收藏
页码:95195 / 95200
页数:6
相关论文
共 18 条
[1]  
[Anonymous], 2002, Non-linear control for underactuated mechanical systems
[2]   Nonlinear control with friction compensation to swing-up a Furuta pendulum [J].
Antonio-Cruz, Mayra ;
Hernandez-Guzman, Victor Manuel ;
Merlo-Zapata, Carlos Alejandro ;
Marquez-Sanchez, Celso .
ISA TRANSACTIONS, 2023, 139 :713-723
[3]   Inverted pendulum systems: rotary and arm-driven - a mechatronic system design case study [J].
Awtar, S ;
King, N ;
Allen, T ;
Bang, I ;
Hagan, M ;
Skidmore, D ;
Craig, K .
MECHATRONICS, 2002, 12 (02) :357-370
[4]   On the Dynamics of the Furuta Pendulum [J].
Cazzolato, Benjamin Seth ;
Prime, Zebb .
JOURNAL OF CONTROL SCIENCE AND ENGINEERING, 2011, 2011
[5]  
Chebotar Y, 2019, IEEE INT CONF ROBOT, P8973, DOI [10.1109/icra.2019.8793789, 10.1109/ICRA.2019.8793789]
[6]  
Ding H., 2006, P 1 IEEE C IND EL AP, P1
[7]  
Furuta K., 2004, Bull. of the Polish Acad. of Sci.: Tech. Sci, V52, P153
[8]  
Furuta K., 1992, Proc. Inst. Mech. Engineers, Part I, J. Syst. Control Eng., V206, P263
[9]  
Haarnoja T, 2018, Arxiv, DOI arXiv:1801.01290
[10]   Reinforcement learning to adaptive control of nonlinear systems [J].
Hwang, KS ;
Tan, SW ;
Tsai, MC .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2003, 33 (03) :514-521