Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

被引:0
|
作者
Zou, Yicheng [1 ,2 ,3 ]
Zhu, Bolin [2 ,3 ]
Hu, Xingwu [2 ,3 ]
Gui, Tao [1 ]
Zhang, Qi [2 ,3 ]
机构
[1] Fudan Univ, Inst Modern Languages & Linguist, Shanghai, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid increase in the volume of dialogue data from daily life, there is a growing demand for dialogue summarization. Unfortunately, training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries. Most existing works for low-resource dialogue summarization directly pretrain models in other domains, e.g., the news domain, but they generally neglect the huge difference between dialogues and conventional articles. To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data. Specifically, we exploit large-scale in-domain non-summary data to separately pretrain the dialogue encoder and the summary decoder. The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization. The experimental results on two public datasets show that with only limited training data, our approach achieves competitive performance and generalizes well in different dialogue scenarios.
引用
收藏
页码:80 / 91
页数:12
相关论文
共 50 条
  • [1] Multi-source inverse-curriculum-based training for low-resource dialogue generation
    Fuwei Cui
    Hui Di
    Hui Huang
    Hongjie Ren
    Kazushige Ouchi
    Ze Liu
    Jinan Xu
    Applied Intelligence, 2023, 53 : 13665 - 13676
  • [2] Multi-source inverse-curriculum-based training for low-resource dialogue generation
    Cui, Fuwei
    Di, Hui
    Huang, Hui
    Ren, Hongjie
    Ouchi, Kazushige
    Liu, Ze
    Xu, Jinan
    APPLIED INTELLIGENCE, 2023, 53 (11) : 13665 - 13676
  • [3] DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization
    Li, Yu
    Peng, Baolin
    He, Pengcheng
    Galley, Michel
    Yu, Zhou
    Gao, Jianfeng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1368 - 1386
  • [4] AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization
    Yu, Tiezheng
    Liu, Zihan
    Fung, Pascale
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5892 - 5904
  • [5] A Unified Data Augmentation Framework for Low-Resource Multi-domain Dialogue Generation
    Liu, Yongkang
    Nie, Ercong
    Feng, Shi
    Hua, Zheng
    Ding, Zifeng
    Wang, Daling
    Zhang, Yifei
    Schuetze, Hinrich
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT II, ECML PKDD 2024, 2024, 14942 : 162 - 177
  • [6] Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages
    Choi, Gyu-Hyeon
    Shin, Jong-Hun
    Kim, Young-Kil
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 900 - 904
  • [7] Low-Resource Compositional Semantic Parsing with Concept Pretraining
    Rongali, Subendhu
    Sridhar, Mukund
    Khan, Haidar
    Arkoudas, Konstantine
    Hamza, Wael
    McCallum, Andrew
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1410 - 1419
  • [8] Multi-language transfer learning for low-resource legal case summarization
    Moro, Gianluca
    Piscaglia, Nicola
    Ragazzi, Luca
    Italiani, Paolo
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 1111 - 1139
  • [9] Multi-source, multilingual information extraction and summarization
    Vicedo, Jose L.
    Tomas, David
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (07): : 1519 - 1521
  • [10] Exploring Multitask Learning for Low-Resource Abstractive Summarization
    Magooda, Ahmed
    Elaraby, Mohamed
    Litman, Diane
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1652 - 1661