Symbolic Regression Methods for Reinforcement Learning

被引:6
|
作者
Kubalik, Jiri [1 ]
Derner, Erik [1 ,2 ]
Zegklitz, Jan [2 ]
Babuska, Robert [1 ,3 ]
机构
[1] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague 16000, Czech Republic
[2] Czech Tech Univ, Fac Elect Engn, Prague 16627, Czech Republic
[3] Delft Univ Technol, Cognit Robot, NL-2628 Delft, Netherlands
关键词
Mathematical models; Reinforcement learning; Genetic programming; Numerical models; Approximation algorithms; Tuning; Training; value iteration; policy iteration; symbolic regression; genetic programming; nonlinear optimal control; POLICIES;
D O I
10.1109/ACCESS.2021.3119000
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning algorithms can solve dynamic decision-making and optimal control problems. With continuous-valued state and input variables, reinforcement learning algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering little insight into the mappings learned, and they require extensive trial and error tuning of their hyper-parameters. In this paper, we propose a new approach to constructing smooth value functions in the form of analytic expressions by using symbolic regression. We introduce three off-line methods for finding value functions based on a state-transition model: symbolic value iteration, symbolic policy iteration, and a direct solution of the Bellman equation. The methods are illustrated on four nonlinear control problems: velocity control under friction, one-link and two-link pendulum swing-up, and magnetic manipulation. The results show that the value functions yield well-performing policies and are compact, mathematically tractable, and easy to plug into other algorithms. This makes them potentially suitable for further analysis of the closed-loop system. A comparison with an alternative approach using neural networks shows that our method outperforms the neural network-based one.
引用
收藏
页码:139697 / 139711
页数:15
相关论文
共 50 条
  • [1] Supplementing neural reinforcement learning with symbolic methods
    Sun, R
    HYBRID NEURAL SYSTEMS, 2000, 1778 : 333 - 347
  • [2] Reinforcement Symbolic Learning
    Mercier, Chloe
    Alexandre, Frederic
    Vieville, Thierry
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 608 - 612
  • [3] Control Synthesis as Machine Learning Control by Symbolic Regression Methods
    Shmalko, Elizaveta
    Diveev, Askhat
    APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [4] RL-GEP: Symbolic Regression via Gene Expression Programming and Reinforcement Learning
    Zhang, Hengzhe
    Zhou, Aimin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Interactive symbolic regression with co-design mechanism through offline reinforcement learning
    Yuan Tian
    Wenqi Zhou
    Michele Viscione
    Hao Dong
    David S. Kammer
    Olga Fink
    Nature Communications, 16 (1)
  • [6] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
    Sheikh, Hassam Ullah
    Khadka, Shauharda
    Miret, Santiago
    Majumdar, Somdeb
    Phielipp, Mariano
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] Combining reinforcement learning with symbolic planning
    Grounds, Matthew
    Kudenko, Daniel
    ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 75 - 86
  • [8] Reinforcement Learning Guided Symbolic Execution
    Wu, Jie
    Zhang, Chengyu
    Pu, Geguang
    PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), 2020, : 662 - 663
  • [9] Reinforcement learning for symbolic expression induction
    Vogiatzis, D
    Stafylopatis, A
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2000, 51 (3-4) : 169 - 179
  • [10] Integrating symbolic knowledge in reinforcement learning
    Hailu, G
    Sommer, G
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 1491 - 1496