State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions

被引：1

作者：

Wang, Cheng ^{[1
]}

Lawrence, Carolin ^{[2
]}

Niepert, Mathias ^{[2
,3
]}

机构：

[1] Amazon, D-10117 Berlin, Germany

[2] NEC Labs Europe, D-69115 Heidelberg, Germany

[3] Univ Stuttgart, D-70174 Stuttgart, Germany

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 06期

关键词：

Stochastic processes; Logic gates; Learning automata; Behavioral sciences; Probabilistic logic; Symbols; Memory management; Automata extraction; explainability; interpretability; memorization; recurrent neural networks; state machine;

D O I：

10.1109/TPAMI.2022.3225334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, they are often treated as black-box models and as such it is difficult to understand what exactly they learn as well as how they arrive at a particular prediction. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) non-regular languages such as balanced parentheses and palindromes where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition and text categorisation. We show that state-regularization (a) simplifies the extraction of finite state automata that display an RNN's state transition dynamic; (b) forces RNNs to operate more like automata with external memory and less like finite state machines, which potentiality leads to a more structural memory; (c) leads to better interpretability and explainability of RNNs by leveraging the probabilistic finite state transition mechanism over time steps.

引用

页码：7739 / 7750

页数：12

共 74 条

[1] Al-Rfou R., 2016, ARXIV, DOI DOI 10.48550/ARXIV.1605.02688
[2] Arjovsky M, 2016, PR MACH LEARN RES, V48
[3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
[4] Bai SJ, 2018, Arxiv, DOI arXiv:1803.01271
[5] Bayer J, 2015, Arxiv, DOI arXiv:1411.7610
[6] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[7] Campos V., 2018, INT C LEARN REPR
[8] Chang SY, 2017, ADV NEUR IN, V30
[9] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
[10] Cooijmans T., 2017, P 5 INT C LEARN REPR

← 1 2 3 4 5 6 7 8 →