State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions

被引:1
作者
Wang, Cheng [1 ]
Lawrence, Carolin [2 ]
Niepert, Mathias [2 ,3 ]
机构
[1] Amazon, D-10117 Berlin, Germany
[2] NEC Labs Europe, D-69115 Heidelberg, Germany
[3] Univ Stuttgart, D-70174 Stuttgart, Germany
关键词
Stochastic processes; Logic gates; Learning automata; Behavioral sciences; Probabilistic logic; Symbols; Memory management; Automata extraction; explainability; interpretability; memorization; recurrent neural networks; state machine;
D O I
10.1109/TPAMI.2022.3225334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, they are often treated as black-box models and as such it is difficult to understand what exactly they learn as well as how they arrive at a particular prediction. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) non-regular languages such as balanced parentheses and palindromes where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition and text categorisation. We show that state-regularization (a) simplifies the extraction of finite state automata that display an RNN's state transition dynamic; (b) forces RNNs to operate more like automata with external memory and less like finite state machines, which potentiality leads to a more structural memory; (c) leads to better interpretability and explainability of RNNs by leveraging the probabilistic finite state transition mechanism over time steps.
引用
收藏
页码:7739 / 7750
页数:12
相关论文
共 74 条
  • [1] Al-Rfou R., 2016, ARXIV, DOI DOI 10.48550/ARXIV.1605.02688
  • [2] Arjovsky M, 2016, PR MACH LEARN RES, V48
  • [3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
  • [4] Bai SJ, 2018, Arxiv, DOI arXiv:1803.01271
  • [5] Bayer J, 2015, Arxiv, DOI arXiv:1411.7610
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Campos V., 2018, INT C LEARN REPR
  • [8] Chang SY, 2017, ADV NEUR IN, V30
  • [9] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
  • [10] Cooijmans T., 2017, P 5 INT C LEARN REPR