Optimizing Reinforcement Learning Control Model in Furuta Pendulum and Transferring it to Real-World

被引：1

作者：

Hong, Myung Rae ^{[1
]}

Kang, Sanghun ^{[1
]}

Lee, Jingoo ^{[2
]}

Seo, Sungchul ^{[3
]}

Han, Seungyong ^{[1
]}

Koh, Je-Sung ^{[1
]}

Kang, Daeshik ^{[1
]}

机构：

[1] Ajou Univ, Dept Mech Engn, Multiscale Bioinspired Technol Lab, Suwon 16499, South Korea

[2] Korea Inst Machinery ad Mat, Dept Sustainable Environm Res, Multiscale Bioinspired Technol Lab, Daejeon 34103, South Korea

[3] Seokyeong Univ, Dept Nanochem Biol & Environm Engn, Seoul 02713, South Korea

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

新加坡国家研究基金会;

关键词：

Furuta pendulum; inverted pendulum problem; reward design; reinforcement learning; Sim2Real;

D O I：

10.1109/ACCESS.2023.3310405

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning does not require explicit robot modeling as it learns on its own based on data, but it has temporal and spatial constraints when transferred to real-world environments. In this research, we trained a balancing Furuta pendulum problem, which is difficult to model, in a virtual environment (Unity) and transferred it to the real world. The challenge of the balancing Furuta pendulum problem is to maintain the pendulum's end effector in a vertical position. We resolved the temporal and spatial constraints by performing reinforcement learning in a virtual environment. Furthermore, we designed a novel reward function that enabled faster and more stable problem-solving compared to the two existing reward functions. We validate each reward function by applying it to the soft actor-critic (SAC) and proximal policy optimization (PPO). The experimental result shows that cosine reward function is trained faster and more stable. Finally, SAC algorithm model using a cosine reward function in the virtual environment is an optimized controller. Additionally, we evaluated the robustness of this model by transferring it to the real environment.

引用

页码：95195 / 95200

页数：6

共 50 条

[31] Reinforcement Learning Compensation based PD Control for a Double Inverted Pendulum
Puriel-Gil, G.
Yu, W.
Sossa, H.
IEEE LATIN AMERICA TRANSACTIONS, 2019, 17 (02) : 323 - 329
[32] Approximate neural optimal control with reinforcement learning for a torsional pendulum device
Wang, Ding
Qiao, Junfei
NEURAL NETWORKS, 2019, 117 : 1 - 7
[33] Design of Reinforcement Learning Algorathm for Single Inverted Pendulum Swing Control
Yue Chao
Liu Yongxin
Wang Linglin
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1558 - 1562
[34] Active exploration planning in reinforcement learning for inverted pendulum system control
Zheng, Yu
Luo, Si-Wei
Lv, Zi-Ang
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2805 - +
[35] VesNet-RL: Simulation-Based Reinforcement Learning for Real-World US Probe Navigation
Bi, Yuan
Jiang, Zhongliang
Gao, Yuan
Wendler, Thomas
Karlas, Angelos
Navab, Nassir
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 6638 - 6645
[36] A World Model for Actor–Critic in Reinforcement Learning
A. I. Panov
L. A. Ugadiarov
Pattern Recognition and Image Analysis, 2023, 33 : 467 - 477
[37] Quantum Reinforcement Learning with Quantum World Model
Zeng, Peigen
He, Ying
Yu, F. Richard
Leung, Victor C. M.
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3185 - 3190
[38] Self-optimizing adaptive optics control with Reinforcement Learning
Landman, R.
Haffert, S. Y.
Radhakrishnan, V. M.
Keller, C. U.
ADAPTIVE OPTICS SYSTEMS VII, 2020, 11448
[39] Robust Control of An Inverted Pendulum System Based on Policy Iteration in Reinforcement Learning
Ma, Yan
Xu, Dengguo
Huang, Jiashun
Li, Yahui
APPLIED SCIENCES-BASEL, 2023, 13 (24):
[40] Integration of Adaptive Control and Reinforcement Learning for Real-Time Control and Learning
Annaswamy, Anuradha M.
Guha, Anubhav
Cui, Yingnan
Tang, Sunbochen
Fisher, Peter A.
Gaudio, Joseph E.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (12) : 7740 - 7755

← 1 2 3 4 5 →