Monte Carlo tree search control scheme for multibody dynamics applications

被引:0
作者
Yixuan Tang
Grzegorz Orzechowski
Aleš Prokop
Aki Mikkola
机构
[1] LUT University,Department of Mechanical Engineering
[2] Brno University of Technology,Faculty of Mechanical Engineering
来源
Nonlinear Dynamics | 2024年 / 112卷
关键词
Monte Carlo Tree Search; Multibody dynamics; Reward functions; Parametric analysis; Artificial intelligence control; Inverted pendulum;
D O I
暂无
中图分类号
学科分类号
摘要
There is considerable interest in applying reinforcement learning (RL) to improve machine control across multiple industries, and the automotive industry is one of the prime examples. Monte Carlo Tree Search (MCTS) has emerged and proven powerful in decision-making games, even without understanding the rules. In this study, multibody system dynamics (MSD) control is first modeled as a Markov Decision Process and solved with Monte Carlo Tree Search. Based on randomized search space exploration, the MCTS framework builds a selective search tree by repeatedly applying a Monte Carlo rollout at each child node. However, without a library of available choices, deciding among the many possibilities for agent parameters can be intimidating. In addition, the MCTS poses a significant challenge for searching due to the large branching factor. This challenge is typically overcome by appropriate parameter design, search guiding, action reduction, parallelization, and early termination. To address these shortcomings, the overarching goal of this study is to provide needed insight into inverted pendulum controls via vanilla and modified MCTS agents, respectively. A series of reward functions are well-designed according to the control goal, which maps a specific distribution shape of reward bonus and guides the MCTS-based control to maintain the upright position. Numerical examples show that the reward-modified MCTS algorithms significantly improve the control performance and robustness of the default choice of a constant reward that constitutes the vanilla MCTS. The exponentially decaying reward functions perform better than the constant value or polynomial reward functions. Moreover, the exploitation vs. exploration trade-off and discount parameters are carefully tested. The study’s results can guide the research of RL-based MSD users.
引用
收藏
页码:8363 / 8391
页数:28
相关论文
共 72 条
  • [1] Kurinov I(2020)Automated excavator based on reinforcement learning and multibody system dynamics IEEE Access 8 213998-214006
  • [2] Orzechowski G(2021)Ant-td: ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection Swarm Evol. Comput. 64 100892-359
  • [3] Hämäläinen P(2017)Mastering the game of Go without human knowledge Nature 550 354-489
  • [4] Mikkola A(2016)Mastering the game of Go with deep neural networks and tree Nature 529 484-295
  • [5] Paniri M(2018)Hierarchical reinforcement learning with monte carlo tree search in computer fighting game IEEE Trans. Games 11 290-2562
  • [6] Dowlatshahi MB(2023)Monte carlo tree search: a review of recent modifications and applications Artif. Intell. Rev. 56 2497-1545
  • [7] Nezamabadi-pour H(2022)Optimal state space reconstruction via monte carlo decision tree search Nonlinear Dyn. 108 1525-702
  • [8] Silver D(2022)Data-driven uncertainty quantification in computational human head models Comput. Meth. Appl. Mech. Eng. 398 115108-62
  • [9] Silver D(2017)Combinatorial multi-armed bandits for real-time strategy games J. Artif. Intell. Res. 58 665-223
  • [10] Pinto IP(2016)Adaptive playouts for online learning of policies during monte carlo tree search Theoret. Comput. Sci. 644 53-454