Swing-up control of a reaction wheel pendulum using the actor-critic algorithm

被引:0
作者
Kim D.-W. [1 ]
Park H.J. [1 ]
机构
[1] Department of Mechanical Design and Robot Engineering, Seoul National University of Science and Technology
来源
Park, Hee Jae (looki@seoultech.ac.kr) | 1600年 / Institute of Control, Robotics and Systems卷 / 27期
关键词
Actor critic; Artificial neural network; Intelligent robot; Machine learning; Nonlinear system; Reinforcement learning;
D O I
10.5302/J.ICROS.2021.21.0057
中图分类号
学科分类号
摘要
In this study, we verified the performance of the swing-up control method for a reaction wheel pendulum using the actor-critic algorithm in both simulation and experiment and suggested the possibility that reinforcement learning, using shallow neural networks, can be applied to studying intelligent robots that act in real-world environments, such as a robot that teaches itself to walk through trial and error. The actor of the proposed actor-critic algorithm used the policy network to determine the rotational direction of the reaction wheel based on the angular position and velocity of the pendulum and the angular velocity of the reaction wheel. The critic used the value network to estimate the expected reward based on the same factors as the actor’s. In both simulation and in the real-world environment, through trial and error, the proposed algorithm successfully learned how to swing up and stabilize the pendulum by choosing the rotational direction ‒ between the clockwise and counter-clockwise directions ‒ of the reaction wheel. © ICROS 2021.
引用
收藏
页码:745 / 753
页数:8
相关论文
共 20 条
[1]  
Ha J., Detection of Korea license plate by mask R-CNN using composite image, Journal of Institute of Control, Robotics and Systems (In Korean), 26, 9, (2020)
[2]  
Kim J., Seol J., Son H.I., Preliminary experimental results of a deep learning-based intelligent spraying system for pear orchard, Journal of Institute of Control, Robotics and Systems (In Korean), 26, 1, (2020)
[3]  
Ahmad A.M., Ismail S., Samaon D.F., Recurrent neural network with backpropagation through time for speech recognition, IEEE International Symposium on Communications and Information Technology, ISCIT 2004, 1, (2004)
[4]  
Hermanto A., Adji T.B., Setiawan N.A., Recurrent neural network language model for English-Indonesian Machine Translation: Experimental study, 2015 International Conference on Science in Information Technology, (2015)
[5]  
Hwangbo J.M., Lee J.H., Dosovitskiy A., Bellicoso D., Tsounis V., Koltun V., Hutter M., Learning agile and dynamic motor skills for legged robots, Science Robotics, 4, 26, (2019)
[6]  
Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, (2018)
[7]  
Konda V.R., Tsitsiklis J.N., Actor-critic algorithms, Proceedings of the 12Th International Conference on Neural Information Processing Systems November, (1999)
[8]  
Peters J., Schaal S., Natural actor-critic, Neurocomputing, 71, 7-9, pp. 1180-1190, (2008)
[9]  
Zheng Y., Li X., Xu L., Balance control for the first-order inverted pendulum based on the advantage actor-critic algorithm, International Journal of Control, Automation and Systems, 18, 12, pp. 3093-3100, (2020)
[10]  
Block D., Astrom K., Spong M.W., The Reaction Wheel Pendulum, (2007)