ATTEXPLAINER: Explain Transformer via Attention by Reinforcement Learning

被引:0
作者
Niu, Runliang [1 ]
Wei, Zhepei [1 ]
Wang, Yan [1 ,2 ]
Wang, Qi [1 ]
机构
[1] Jilin Univ, Sch Artificial Intelligence, Changchun, Peoples R China
[2] Jilin Univ, Minist Educ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Changchun, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022 | 2022年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely ATTEXPLAINER. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https://github.com/niuzaisheng/AttExplainer.
引用
收藏
页码:724 / 731
页数:8
相关论文
共 41 条
[1]  
Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190
[2]  
Alzantot M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2890
[3]  
Bowman Samuel R., EMNLP
[4]   Polynomial calculation of the Shapley value based on sampling [J].
Castro, Javier ;
Gomez, Daniel ;
Tejada, Juan .
COMPUTERS & OPERATIONS RESEARCH, 2009, 36 (05) :1726-1730
[5]   Transformer Interpretability Beyond Attention Visualization [J].
Chefer, Hila ;
Gur, Shir ;
Wolf, Lior .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :782-791
[6]  
Devlin Jacob, 2019, NAACL-HLT
[7]  
Ebrahimi J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P31
[8]  
Feng S, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3719
[9]   Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers [J].
Gao, Ji ;
Lanchantin, Jack ;
Soffa, Mary Lou ;
Qi, Yanjun .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :50-56
[10]  
Hewitt J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4129