Off-policy and on-policy reinforcement learning with the Tsetlin machine

被引:2
作者
Gorji, Saeed Rahimi [1 ]
Granmo, Ole-Christoffer [1 ]
机构
[1] Univ Agder, Ctr Artificial Intelligence Res, Grimstad, Norway
关键词
Tsetlin machine; Explainable machine learning; Learning automata; Reinforcement learning; Temporal difference learning; SARSA;
D O I
10.1007/s10489-022-04297-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Tsetlin Machine is a recent supervised learning algorithm that has obtained competitive accuracy- and resource usage results across several benchmarks. It has been used for convolution, classification, and regression, producing interpretable rules in propositional logic. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. Our framework integrates the value iteration algorithm with the regression Tsetlin Machine as the value function approximator. To obtain accurate off-policy state-value estimation, we propose a modified Tsetlin Machine feedback mechanism that adapts to the dynamic nature of value iteration. In particular, we show that the Tsetlin Machine is able to unlearn and recover from the misleading experiences that often occur at the beginning of training. A key challenge that we address is mapping the intrinsically continuous nature of state-value learning to the propositional Tsetlin Machine architecture, leveraging probabilistic updates. While accurate off-policy, this mechanism learns significantly slower than neural networks on-policy. However, by introducing multi-step temporal-difference learning in combination with high-frequency propositional logic patterns, we are able to close the performance gap. Several gridworld instances document that our framework can outperform comparable neural network models, despite being based on simple one-level AND-rules in propositional logic. Finally, we propose how the class of models learnt by our Tsetlin Machine for the gridworld problem can be translated into a more understandable graph structure. The graph structure captures the state-value function approximation and the corresponding policy found by the Tsetlin Machine.
引用
收藏
页码:8596 / 8613
页数:18
相关论文
共 29 条
  • [1] Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability
    Abeyrathna, K. Darshana
    Granmo, Ole-Christoffer
    Goodwin, Morten
    [J]. IEEE ACCESS, 2021, 9 : 8233 - 8248
  • [2] The regression Tsetlin machine: a novel approach to interpretable nonlinear regression
    Abeyrathna, K. Darshana
    Granmo, Ole-Christoffer
    Zhang, Xuan
    Jiao, Lei
    Goodwin, Morten
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2164):
  • [3] Abeyrathna KD, 2021, PR MACH LEARN RES, V139
  • [4] Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization With Medical Applications
    Berge, Geir Thore
    Granmo, Ole-Christoffer
    Tveit, Tor Oddbjorn
    Goodwin, Morten
    Jiao, Lei
    Matheussen, Bernt Viggo
    [J]. IEEE ACCESS, 2019, 7 : 115134 - 115146
  • [5] Bhattacharjee Bishwajit, 2022, Advances in Structural Mechanics and Applications: Proceedings of ASMA-2021. Structural Integrity (27), P1, DOI 10.1007/978-3-031-04793-0_1
  • [6] Ernst D, 2005, J MACH LEARN RES, V6, P503
  • [7] Ernst D, 2003, LECT NOTES ARTIF INT, V2837, P96
  • [8] Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control
    Ernst, Damien
    Glavic, Mevludin
    Geurts, Pierre
    Wehenkel, Louis
    [J]. INTERNATIONAL JOURNAL OF EMERGING ELECTRIC POWER SYSTEMS, 2005, 3 (01): : 1 - 35
  • [9] Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine
    Giri, Charul
    Granmo, Ole-Christoffer
    Van Hoof, Herke
    Blakely, Christian D.
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Gorji SR, 2020, LECT NOTES ARTIF INT, V12144, P695, DOI 10.1007/978-3-030-55789-8_60