CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

被引:6
作者
Kulkarni, Sayali [1 ]
Chammas, Sheide [1 ]
Zhu, Wan [1 ]
Sha, Fei [1 ]
Ie, Eugene [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
关键词
Extractive summarization; Abstractive summarization; Neural models; Transformers; Summarization dataset;
D O I
10.1007/978-3-030-86331-9_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization compress source document (s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (qmps) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on qMDS.
引用
收藏
页码:84 / 98
页数:15
相关论文
共 35 条
  • [1] ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization
    Nhi-Thao Tran
    Minh-Quoc Nghiem
    Nhung T. H. Nguyen
    Ngan Luu-Thuy Nguyen
    Nam Van Chi
    Dien Dinh
    Language Resources and Evaluation, 2020, 54 : 893 - 920
  • [2] ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization
    Tran, Nhi-Thao
    Nghiem, Minh-Quoc
    Nguyen, Nhung T. H.
    Nguyen, Ngan Luu-Thuy
    Van Chi, Nam
    Dinh, Dien
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 893 - 920
  • [3] Abstractive Multi-Document Summarization Based on Semantic Link Network
    Li, Wei
    Zhuge, Hai
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (01) : 43 - 54
  • [4] Multi-document Abstractive Summarization Based on Predicate Argument Structure
    Alshaina, S.
    John, Ansamma
    Nath, Aneesh G.
    2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2017,
  • [5] A multi-document summarization system based on statistics and linguistic treatment
    Ferreira, Rafael
    Cabral, Luciano de Souza
    Freitas, Frederico
    Lins, Rafael Dueire
    Silva, Gabriel de Franca
    Simske, Steven J.
    Favaro, Luciano
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5780 - 5787
  • [6] Integrating Ontology-Based Knowledge to Improve Biomedical Multi-Document Summarization Model
    Quoc-An Nguyen
    Khanh-Vinh Nguyen
    Hoang Quynh Le
    Duy-Cat Can
    Tam Doan-Thanh
    Trung-Hieu Do
    Mai-Vu Tran
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 99 - 110
  • [7] Extractive Multi-Document Summarization: A Review of Progress in the Last Decade
    Jalil, Zakia
    Nasir, Jamal Abdul
    Nasir, Muhammad
    IEEE ACCESS, 2021, 9 : 130928 - 130946
  • [8] A Comparative Study of Deep Learning Approaches for Query-Focused Extractive Multi-Document Summarization
    Yuliska
    Sakai, Tetsuya
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 153 - 157
  • [9] Hybrid multi-document summarization using pre-trained language models
    Ghadimi, Alireza
    Beigy, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 192
  • [10] Enhance Content Selection for Multi-Document Summarization with Entailment Relation
    Wang, Yu-Yun
    Wu, Jhen-Yi
    Chou, Tzu-Hsuan
    Lin, Ying-Jia
    Kao, Hung-Yu
    2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 119 - 124