CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

被引:6
作者
Kulkarni, Sayali [1 ]
Chammas, Sheide [1 ]
Zhu, Wan [1 ]
Sha, Fei [1 ]
Ie, Eugene [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
关键词
Extractive summarization; Abstractive summarization; Neural models; Transformers; Summarization dataset;
D O I
10.1007/978-3-030-86331-9_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization compress source document (s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (qmps) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on qMDS.
引用
收藏
页码:84 / 98
页数:15
相关论文
共 36 条
  • [31] Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networks
    Su W.
    Jiang J.
    Huang K.
    PeerJ Computer Science, 2023, 9
  • [32] HITS-based attentional neural model for abstractive summarization
    Cai, Xiaoyan
    Shi, Kaile
    Jiang, Yuehan
    Yang, Libin
    Liu, Sen
    KNOWLEDGE-BASED SYSTEMS, 2021, 222
  • [33] Integrating Topic-Aware Heterogeneous Graph Neural Network With Transformer Model for Medical Scientific Document Abstractive Summarization
    Khaliq, Ayesha
    Khan, Atif
    Awan, Salman Afsar
    Jan, Salman
    Umair, Muhammad
    Zuhairi, Megat F.
    IEEE ACCESS, 2024, 12 : 113855 - 113866
  • [34] CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
    Karotia, Akanksha
    Susan, Seba
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (14) : 16328 - 16350
  • [35] CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
    Akanksha Karotia
    Seba Susan
    The Journal of Supercomputing, 2023, 79 : 16328 - 16350
  • [36] DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data
    Sah, Amit Kumar
    Abulaish, Muhammad
    MACHINE LEARNING WITH APPLICATIONS, 2024, 17