Tetris Bot using Deep Reinforcement Learning

被引：0

作者：

Park K.-W. ^{[1
]}

Kim J.-S. ^{[1
]}

机构：

[1] Department of Electrical and Information Engineering, Seoul National University of Science and Technology

来源：

Journal of Institute of Control, Robotics and Systems | 2022年 / 28卷 / 12期

基金：

新加坡国家研究基金会;

关键词：

Deep Reinforcement Learning; Multi-agent learning; PPO (Proximal Policy Optimization); Tetris;

D O I：

10.5302/J.ICROS.2022.22.0140

中图分类号：

学科分类号：

摘要：

In this paper, we develop an artificial intelligence Tetris robot that plays the Tetris game autonomously. The Tetris robot consists of a game agent that learns how to play the Tetris game using reinforcement learning, and hardware that plays the actual game. To develop a game agent using deep reinforcement learning, the Markov decision process was defined and a policy-based deep reinforcement learning was applied. In this paper, the Tetris game agent was trained by applying the PPO (Proximal Policy Optimization) algorithm. In particular, the multi-agent learning method was employed for the PPO learning. For learning, the PPO-based game agent took the game screen as an input and applied the action to the game through software to play the Tetris game 500,000 times. In order for the robot to play the actual game, the neural network corresponding to the learned game agent was stored in Jetson Xavier and the motor and camera were used. In other words, the standalone Tetris robot, separate from the computer where the Tetris game is running, consists of a Jetson Xaiver, one camera, one Arduino MEGA, three servo motors, and three fingers. To evaluate the performance of the robot, the value function of the game agent was presented, and the performance of the actual robot was verified through demonstration. © ICROS 2022.

引用

页码：1155 / 1160

页数：5

共 16 条

[1]

Park S.G., Kim D.H., Autonomous flying of drone based on PPO reinforcement learning algorithm, Journal of Institute of Control, Robotics and Systems, 26, 11, pp. 955-963, (2020)

[2]

Park K.-W., Kim J.-H., Aircraft collision avoidance modeling and optimization using deep reinforcement learning, Journal of Institute of Control, Robotics and Systems (In Korean), 27, 9, pp. 652-659, (2021)

[3]

Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I.D.M., Playing Atari with Deep Reinforcement Learning, Arxiv Preprint Arxiv, 1312, (2013)

[4]

Schulman J., Wolski F., Dhariwal P.A.O., Proximal Policy Optimization Algorithms, Arxiv Preprint Arxiv, 1707, (2017)

[5]

Haarnoja T., Zhou A., Hartikainen K., Tucker G., Ha S., Tan J., Kumar V., Zhu H., Gupta A.P.S., Soft Actor-Critic Algorithms and Applications, Arxiv Preprint Arxiv, 1812, (2018)

[6]

Demaine E., Hohenberger S., Liben-Nowell D., Tetris is hard, even to approximate, International Computing and Com-Binatorics Conference, Springer, pp. 351-363, (2003)

[7]

Thiery C., Scherrer B., Improvements on learning Tetris with cross entropy, Icga Journal, 32, 1, pp. 23-33, (2009)

[8]

Szita I., Lorincz A., Learning Tetris using the noisy cross entropy method, Neural Computation, 18, 12, pp. 2936-2941, (2006)

[9]

Gabillon V., Ghavamzadeh M., Scherrer B., Approximate dynamic programming finally performs well in the game of Tetris, Advances in Neural Information Processing Systems, 26, pp. 1754-1762, (2013)

[10]

, “The Game of Tetris in Machine learning,” Arxiv Preprint Arxiv, 1905, (2019)

← 1 2 →