FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

被引:0
作者
Lakhotia, Kushal [1 ]
Paranjape, Bhargavi [2 ]
Ghoshal, Asish [1 ]
Yih, Wen-tau [1 ]
Mehdad, Yashar [1 ]
Iyer, Srinivasan [1 ]
机构
[1] Facebook AI, New York, NY 10003 USA
[2] Univ Washington, Seattle, WA 98195 USA
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations. However, these models have many shortcomings; they can fabricate explanations even for incorrect predictions, they are difficult to adapt to long input documents, and their training requires a large amount of labeled data. In this paper, we develop FiD-Ex(1), which addresses these shortcomings for seq2seq models by: 1) introducing sentence markers to eliminate explanation fabrication by encouraging extractive generation, 2) using the fusion-in-decoder architecture to handle long input contexts, and 3) intermediate fine-tuning on re-structured open domain QA datasets to improve few-shot performance. FiD-Ex significantly improves over prior work in terms of explanation metrics and task accuracy on five tasks from the ERASER explainability benchmark in both fully supervised and few-shot settings.
引用
收藏
页码:3712 / 3727
页数:16
相关论文
共 50 条
  • [31] Leveraging Sequence-to-Sequence Models for Kannada Abstractive Summarization
    Dakshayani Ijeri
    Pushpa B. Patil
    SN Computer Science, 6 (5)
  • [32] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
    Shi, Tian
    Keneshloo, Yaser
    Ramakrishnan, Naren
    Reddy, Chandan K.
    ACM/IMS Transactions on Data Science, 2021, 2 (01):
  • [33] Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
    Panchbhai, Anand
    Soru, Tommaso
    Marx, Edgard
    KNOWLEDGE GRAPHS AND SEMANTIC WEB, KGSWC 2020, 2020, 1232 : 158 - 165
  • [34] Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information
    Zhang, Weizhao
    Yang, Hongwu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [35] Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models
    Wu, Di
    Ahmad, Wasi Uddin
    Chang, Kai-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6642 - 6658
  • [36] Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
    Wang, Wenxuan
    Jiao, Wenxiang
    Hao, Yongchang
    Wang, Xing
    Shi, Shuming
    Tu, Zhaopeng
    Lyu, Michael R.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2591 - 2600
  • [37] Graph augmented sequence-to-sequence model for neural question generation
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Xu, Bo
    APPLIED INTELLIGENCE, 2023, 53 (11) : 14628 - 14644
  • [38] SEQUENCE-TO-SEQUENCE LABANOTATION GENERATION BASED ON MOTION CAPTURE DATA
    Li, Min
    Miao, Zhenjiang
    Ma, Cong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4517 - 4521
  • [39] Graph augmented sequence-to-sequence model for neural question generation
    Hui Ma
    Jian Wang
    Hongfei Lin
    Bo Xu
    Applied Intelligence, 2023, 53 : 14628 - 14644
  • [40] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
    Weiss, Ron J.
    Chorowski, Jan
    Jaitly, Navdeep
    Wu, Yonghui
    Chen, Zhifeng
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629