CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

被引:6
作者
Kulkarni, Sayali [1 ]
Chammas, Sheide [1 ]
Zhu, Wan [1 ]
Sha, Fei [1 ]
Ie, Eugene [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II | 2021年 / 12822卷
关键词
Extractive summarization; Abstractive summarization; Neural models; Transformers; Summarization dataset;
D O I
10.1007/978-3-030-86331-9_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization compress source document (s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (qmps) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on qMDS.
引用
收藏
页码:84 / 98
页数:15
相关论文
共 35 条
  • [21] A Hybrid Solution To Abstractive Multi-Document Summarization Using Supervised and Unsupervised Learning
    Bhagchandani, Gaurav
    Bodra, Deep
    Gangan, Abhishek
    Mulla, Nikahat
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 566 - 570
  • [22] Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention
    Minh-Quang Nguyen
    Duy-Cat Can
    Hoang-Quynh Le
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT V, ECML PKDD 2024, 2024, 14945 : 406 - 422
  • [23] Multi-Document News Web Page Summarization Using Content Extraction and Lexical Chain Based Key Phrase Extraction
    Arya, Chandrakala
    Diwakar, Manoj
    Singh, Prabhishek
    Singh, Vijendra
    Kadry, Seifedine
    Kim, Jungeun
    MATHEMATICS, 2023, 11 (08)
  • [24] Neural attention model with keyword memory for abstractive document summarization
    Choi, YunSeok
    Kim, Dahae
    Lee, Jee-Hyong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (18)
  • [25] Abstractive Summarization by Neural Attention Model with Document Content Memory
    Choi, Yunseok
    Kim, Dahae
    Lee, Jee-Hyong
    PROCEEDINGS OF THE 2018 CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS (RACS 2018), 2018, : 11 - 16
  • [26] Abstractive Document Summarization via Neural Model with Joint Attention
    Hou, Liwei
    Hu, Po
    Bei, Chao
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 329 - 338
  • [27] A novel extractive multi-document text summarization system using quantum-inspired genetic algorithm: MTSQIGA
    Mojrian, Mohammad
    Mirroshandel, Seyed Abolghasem
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
  • [28] Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networks
    Su W.
    Jiang J.
    Huang K.
    PeerJ Computer Science, 2023, 9
  • [29] Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networks
    Su, Wu
    Jiang, Jin
    Huang, Kaihui
    PEERJ, 2023, 11
  • [30] Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networks
    Su, Wu
    Jiang, Jin
    Huang, Kaihui
    PEERJ COMPUTER SCIENCE, 2023, 9